Network selection method and apparatus for integrated cellular and drone-cell networks

ABSTRACT

Provided are a network selection method and an apparatus for integrated cellular and drone-cell networks, where the method includes: acquiring a dynamic network model and a dynamic user model, generating a random event vector according to the dynamic network model and the dynamic user model; generating an action vector according to the random event vector; obtaining an individual utility of each user according to the action vector and the random event vector; constructing a first selection model; obtaining the value of the action probability according the first selection model to determine networks that the users choose to access according to the value of the action probability.

CROSS-REFERENCE TO RELATED APPLICATION

This application claims priority to Chinese Patent Application No.201811353219.4, filed on Nov. 14, 2018, which is hereby incorporated byreference in its entirety.

TECHNICAL FIELD

The present disclosure relates to the field of a network selection, andin particular to a network selection method and an apparatus forintegrated cellular and drone-cell networks.

BACKGROUND

In order to provide better network dada service and alleviate thecongestion of cellular network, a solution of resorting to drone-cells(i.e., low-altitude drones equipped with transceivers) to offloadtraffic from the congested cellular network can be adopted. To make fulluse of resources of drone-cells, a significant challenge is how to makean efficient and fair network selection for integrated cellular anddrone-cell networks.

An approach for solving a network selection problem is agame-theory-based approach. In the game-theory-based approach, thenetwork selection problem is firstly modeled as a game and then acentralized/distributed approach is exploited to achieve an equilibrium.For example, Cheung et al. formulated the network selection problem as aBayesian game under the condition that mobility information of users isnot complete. Then, they proposed a distributed approach with goodconvergence properties to achieve the Bayesian Nash equilibrium.

The existing game-theory-based approaches consider the interaction andcompetition among users. Most of the approaches, however, studied thenetwork selection problem under the situation with quasi-static orpredictable network states. However, the integrated cellular anddrone-cell networks are highly dynamic, and the network state is hard topredict. Therefore, it is difficult for the existing game-theory-basedapproaches to solve the network selection problem of the integratedcellular and drone-cell networks.

SUMMARY

The present disclosure provides a network selection method and anapparatus for integrated cellular and drone-cell networks, aiming atsolving the problem that the existing game-theory-based approachescannot solve the network selection problem for the integrated cellularand drone-cell networks because the integrated cellular and drone-cellnetworks are highly dynamic, and the network state is hard to predict.

In a first aspect, the present disclosure provides a network selectionmethod for integrated cellular and drone-cell networks, including:

acquiring a dynamic network model and a dynamic user model, where thedynamic network model includes at least a location model of thedrone-cell networks, a capacity model of the cellular network and acapacity model of the drone-cell networks, and the dynamic user modelincludes at least a location model and a transmission rate model ofusers;

generating accessible network sets of the users according to thelocation model of the drone-cell networks and the location model of theusers;

generating a random event vector according to the capacity model of thecellular network, the capacity model of the drone-cell networks, theaccessible network sets of the users and the transmission rate model,where the accessible network sets of the users include the drone-cellnetworks and/or the cellular network:

generating an action vector according to the random event vector wherethe action vector is used for indicating that the users choose to accessto the drone-cell networks and/or the cellular network:

obtaining an individual utility of each user according to the actionvector and the random event vector;

constructing a first selection model, where the first selection modelincludes a first objective function and a first constraint, where thefirst objective function is a proportional fairness function with timeaverage of individual utilities of the users as independent variables,and the first constraint includes at least a first coarse correlatedequilibrium constraint, a first minimum individual time average utilityconstraint and a first action probability constraint, where the firstcoarse correlated equilibrium constraint is used for constraining thetime average of the individual utilities and first auxiliary variables,the first minimum individual time average utility constraint is used forconstraining the time average of the individual utilities, and the firstaction probability constraint is used for constraining the actionprobability under the condition of the random event vector;

obtaining the time average of the individual utilities according to theindividual utilities, the random event probability and the actionprobability under the condition of the random event vector, where theaction probability under the condition of the random event vector is theprobability that the users execute the action vector under the conditionthat the random event vector occurs, and the random event probability isthe probability that the random event occurs; and

obtaining a value of the action probability according the firstselection model, and determining networks that the users choose toaccess according to the value of the action probability.

The network selection method provided by the present disclosure includesacquiring the dynamic network model and the dynamic user model,generating the random event vector according to the dynamic networkmodel and the dynamic user model, constructing the first selection modelaccording to the random event vector and the action vector, obtainingthe value of the action probability according to the first selectionmodel, and determining the networks that the users choose to accessaccording to the value of the action probability. It solves the problemthat the existing game-theory-based approaches cannot solve the networkselection problem for the integrated cellular and drone-cell networksbecause the integrated cellular and drone-cell networks are highlydynamic, and the network state is hard to predict.

In a second aspect, the present disclosure provides a network selectionapparatus for integrated cellular and drone-cell networks, including:

a transceiver, configured to collect capacity information of thedrone-cell networks, capacity information of the cellular network, setinformation of accessible networks of the users, data transmission rateinformation, and transmitting action vector information to the users sothat the users can determine the networks to access according to theaction vector information;

a processor, configured to generate the action vector informationaccording to the capacity information of the drone-cell networks, thecapacity information of the cellular network, the set information of theaccessible networks of the users, the data transmission rate informationand a fourth selection model.

The fourth selection model is that a difference value between drift oftotal violation and an utility is less than or equal to a penalty upperbound, where

the drift of the total violation is obtained according to a value of thetotal violation at a current time slot and a value of the totalviolation at a next time slot;

the value of the total violation at the current time slot is obtainedaccording to a first virtual value at the current time slot, a secondvirtual value at the current time slot and a third virtual value at thecurrent time slot;

the first virtual value in the first virtual queue at the current timeslot is generated according to violation of second coarse correlatedequilibrium constraints at a previous time slot and the first virtualvalue in the first virtual queue at the previous time slot;

the second virtual value in the second virtual queue at the current timeslot is generated according to violation of third auxiliary variableconstraints at the previous time slot and the second virtual value inthe second virtual queue at the previous time slot;

the third virtual value in the third virtual queue at the current timeslot is generated according to violation of a second minimum individualtime average utility constraint at the previous time slot and the thirdvirtual value in the third virtual queue at the previous time slot,where the first virtual value at an initial time slot, the secondvirtual value at the initial time slot, and the third virtual value atthe initial time slot are all zero;

a third selection model includes a third objective function and a thirdconstraint, where the third objective function is time averageexpectation of a proportional fair function with third auxiliaryvariables as independent variables, and the third constraint includes atleast the second coarse correlated equilibrium constraints, the secondminimum individual time average utility constraint, a second auxiliaryvariable constraint, and the third auxiliary variable constraints, wherethe second coarse correlated equilibrium constraints are used forconstraining time average expectation of the individual utilities andtime average expectation of the second auxiliary variables, the secondminimum individual time average utility constraint is used forconstraining the time average expectation of the individual utilities,the second auxiliary variable constraint is used for constraining thesecond auxiliary variables, and the third auxiliary variable constraintsare used for constraining the time average expectation of the thirdauxiliary variables and the average time expectation of the individualutilities;

the first selection model includes a first objective function and afirst constraint, where the first objective function is a proportionalfairness function with time average of the individual utilities asindependent variables, and the first constraint includes at least afirst coarse correlated equilibrium constraint, a first minimumindividual time average utility constraint and a first actionprobability constraint, where the first coarse correlated equilibriumconstraint is used for constraining the time average of the individualutilities and first auxiliary variables, the first minimum individualtime average utility constraint is used for constraining the timeaverage of the individual utilities, and the first action probabilityconstraint is used for constraining the action probability under thecondition of an random event vector;

the time average of the individual utilities is obtained according tothe individual utilities, a random event probability and the actionprobability under the condition of the random event vector, where theaction probability under the condition of the random event vector is theprobability that the users execute the action vector under the conditionthat the random event vector occurs;

an individual utility of each user is obtained according to the actionvector and the random event vector; and

the action vector is generated according to the random event vector, andthe random event vector is generated according to a capacity model ofthe cellular network, a capacity model of the drone-cell networks,accessible network sets of the users and a transmission rate model,where the accessible network sets of the users is generated according toa location model of the drone-cell networks and a location model of theusers.

In the network selection method and the apparatus provided by thepresent disclosure, the network selection method includes acquiring thedynamic network model and the dynamic user model, generating the randomevent vector according to the acquired dynamic network model and thedynamic user model, constructing the first selection model according tothe random event vector and the action vector, obtaining the value ofthe action probability according to the first selection model, anddetermining the networks that the users choose to access according tothe value of the action probability.

The present disclosure constructs a dynamic network model and a dynamicuser model, simulating the high dynamic of the drone-cell-userconnection, the fluctuation of the network capacity and the time-varyingof user traffic, etc. The present disclosure formulates the networkselection problem as a problem of a repeated stochastic game, whichsimulates the competition and interaction among the users well. Themethod can maximize the total user utility while ensuring the fairnessamong the users. It solves the problem that the existinggame-theory-based approaches cannot solve the network selection problemfor the integrated cellular and drone-cell networks because theintegrated cellular and drone-cell networks are highly dynamic, and thenetwork state is hard to predict.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a scenario diagram of a network on which a network selectionmethod according to the present disclosure is based;

FIG. 2 is a flowchart illustrating a network selection method accordingto an exemplary embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a repeated stochastic game structureobeyed by the network selection method according to the embodiment shownin FIG. 2;

FIG. 4 is a schematic structural diagram of a network selectionapparatus for the integrated cellular and drone-cell networks accordingto an exemplary embodiment of the present disclosure;

FIG. 5 is a flowchart of the method performed by a processor in thenetwork selection apparatus for the integrated cellular and drone-cellnetworks according to the embodiment shown in FIG. 4 of the presentdisclosure;

FIG. 6 is a schematic diagram illustrating the change of stabilityvariables with time of the network selection method proposed by thepresent disclosure when the number of users is N=50 and the number ofdrones is M_(d)=6;

FIG. 7 is a schematic diagram illustrating the effect of the number ofusers N on the running time of the network selection method proposed bythe present disclosure when the number of drones is M_(d)=6;

FIG. 8 is a schematic diagram illustrating the effect of the number ofdrones M_(d) on the running time of the network selection methodproposed by the present disclosure when the number of users is N=50;

FIG. 9 is a schematic diagram illustrating the effect of the number ofusers N on the total user utilities obtained by the network selectionmethod proposed by the present disclosure and the comparison methodswhen the number of drones is M_(d)=6;

FIG. 10 is a schematic diagram illustrating the effect of the number ofdrones M_(d) on the total user utilities obtained by the networkselection method proposed by the present disclosure and the comparisonmethods when the number of users is N=50;

FIG. 11 is a schematic diagram illustrating the effect of the number ofusers N on the Jain's fairness index obtained by the network selectionmethod proposed by the present disclosure and the comparison methodswhen the number of drones is M_(d)=6; and

FIG. 12 is a schematic diagram illustrating the effect of the number ofdrones M_(d) on the Jain's fairness index obtained by the networkselection method proposed by the present disclosure and the comparisonmethods when the number of users is N=50.

DETAILED IMPLEMENTATION OF THE EMBODIMENTS

In order to make purpose, technical solutions and advantages ofembodiments of the present disclosure clearer, the technical solutionsof the embodiments of the present disclosure will be described hereunderclearly and comprehensively with reference to accompanying drawings.Apparently, the described embodiments are only a part of embodiments ofthe present disclosure, rather than all of them. Based on theembodiments of the present disclosure, all other embodiments obtained bythose of ordinary skilled in the art without any creative effort shallfall into the protection scope of the present disclosure.

FIG. 1 is a scenario diagram of a network on which a network selectionmethod according to the present disclosure is based. As shown in FIG. 1,the present disclosure investigates a network selection scenario for theintegrated cellular and drone-cell networks. In this scenario, acellular network 103 is constructed to provide radio access to acollection of users

={1, 2, . . . , N} moving randomly and independently in a geographicalarea of L×W m² over an infinite sequence of time slots t={0, 1, 2, . . .}. Meanwhile, a set of drone-cells are deployed to alleviate thecongestion situation of the cellular network. Meanwhile, the presentdisclosure assumes that every drone-cell connects directly to a groundstation moves independently and is deployed at the same and fixedaltitude h. Let

={0, 1, 2, . . . , M} be a network set, where j=1 denotes the cellularnetwork, j∈

_(d)={2, . . . , M} denotes the j-th drone-cell network, and j=0represents an empty network indicating that a user does not access toany network. At each time slot t, a user can select a network to accessfrom the accessible network set. Meanwhile, the present disclosure doesnot consider the switching cost when a user changes its network accessstate.

FIG. 2 is a flowchart illustrating a network selection method accordingto an exemplary embodiment of the present disclosure. As shown in FIG.2, the network selection method according to the present disclosureincludes:

S101: acquiring a dynamic network model and a dynamic user model.

More specifically, the dynamic network model includes at least alocation model of drone-cell networks, a capacity model of a cellularnetwork and a capacity model of the drone-cell networks, and the dynamicuser model includes at least a location model and a transmission ratemodel of users;

For the network coverage characteristic, this embodiment assumes thatthe cellular network can cover the whole considered area and adrone-cell can only cover a small region. Specifically, every drone-cellis assumed to have the same and limited coverage radius (denoted byR_(d)). Let r_(ij) denotes a horizontal distance between the i-th userand the j-th drone-cell. The i-th user can access to the j-thdrone-cell, if r_(ij)≤R_(d); otherwise, it cannot.

For the location model of the drone-cell networks, this embodimentintroduces a smooth turn mobility model with reflection boundary. Inthis model, each drone will fly with a smooth and random trajectorywithout sharp turning. Specifically, a drone is assumed to have aconstant forward speed V_(d) (in m/s) and can change its centripetalacceleration randomly. The duration (in seconds) for the drone tomaintain its current centripetal acceleration is subject to anexponential distribution with mean 1/λ_(d). Meanwhile, the reciprocal ofthe turning radius (in m) of the drone-cell is subject to a Gaussiandistribution with zero-mean and variance σ_(d) ².

For the capacity model of the networks, this embodiment assumes that thenetwork capacity, denoted by C₁(t), of the cellular network at time slott is subject to a truncated Gaussian distribution N(μ_(b),σ_(b) ²)within the interval C₁(t)∈[μ_(b)−2σ_(b),μ_(b)+2σ_(b)], where2σ_(b)<μ_(b). Meanwhile, this embodiment assumes that for eachdrone-cell network j∈

, its capacity at slot t, denoted by C_(j)(t), is independent andidentically distributed (i.i.d.) and is subject to a truncated Gaussiandistribution N(μ_(c),σ_(c) ²) within the intervalC_(j)(t)∈[μ_(c)−2σ_(c),μ_(c)+2σ_(c)], where 2σ_(c)<μ_(c).

For the location model of the users, this embodiment develops a boundaryGauss-Markov mobility model based on the Gauss-Markov mobility model.Specifically, based on the Gauss-Markov mobility model, this embodimentconsiders a user that moves within a rectangular area

$\lbrack {{- \frac{L}{2}},\frac{L}{2}} \rbrack \times \lbrack {{- \frac{W}{2}},\frac{W}{2}} \rbrack$

and reflects on the boundary. Thus, the location (denoted byl_(u)(t)=((x_(u)(t),y_(u)(t))) and the velocity (denoted by v_(u)(t)=((v_(u) ^(x)(t),v_(u) ^(y)(t))) of a user in the boundaryGauss-Markov mobility model can meet the following updated formulas:

$\begin{matrix}{{l_{u}( {t + 1} )} = {( {- 1} )^{k} \odot ( {{l_{u}(t)} + {v_{u}(t)} - {k \odot ( {L,W} )}} )}} & (1) \\{{{v_{u}( {t + 1} )} = {{( {- 1} )^{k} \odot \alpha_{u} \odot {v_{u}(t)}} + {\sqrt{1 + \alpha_{u}^{2}} \odot \sigma_{u} \odot {w_{u}(t)}}}}{where}{{k = {( {k_{x},k_{y}} ) = ( {\lfloor {( {{x_{u}(t)} + {v_{u}^{x}(t)} + \frac{L}{2}} )/L} \rfloor,\lfloor {( {{y_{u}(t)} + {v_{u}^{y}(t)} + \frac{W}{2}} )/W} \rfloor} )}}, \odot}} & (2)\end{matrix}$

denotes the Hadamard product, └⋅┘ denotes the floor operation,α_(u)=(α_(ux),α_(uy)) is a two-dimensional (2-D) memory level vector,σ_(u)=(σ_(ux),σ_(uy)) is a 2-D asymptotic standard deviation vector ofthe velocity, the 2-D memory level vector and the 2-D asymptoticstandard deviation vector of the velocity are constants, w_(u)(t)=(w_(u) ^(x)(t),w_(u) ^(y)(t)) represents a 2-D uncorrelated Gaussianprocess, and w_(u) ^(x)(t) and w_(u) ^(y)(t) are independent and areboth zero-mean and unit-variance.

For the transmission rate model of the users, this embodiment assumesthat for each user i∈

, the required data transmission rate at time slot t (denoted byR_(i)(t)) is i.i.d. and is subject to a truncated Gaussian distributionN(μ_(R)(t),σ_(R)(t)²) within the interval R_(i)(t)∈[μ_(R) (t)−2σ_(R)(t),μ_(R)(t)+2σ_(R)(t)], where σ_(R) (t)=ρ_(R)μ_(R)(t) with ρ_(R)<½.Furthermore, μ_(R)(t) is a Markov process that taking a value in afinite set {μ₁, μ₂, . . . , μ_(K) _(μ) }, and its one-step transferprobability matrix is defined by this embodiment as P=[p_(i) _(μ) _(,j)_(μ) ]_(K) _(μ) _(×K) _(μ) , where p_(i) _(μ) _(,j) _(μ) is theprobability that the μ_(R)(t) will transfer from the given current valueμ_(i) _(μ) to the value μ_(j) _(μ) at next time slot.

S102: generating accessible network sets of users according to alocation model of the drone-cell networks and a location model of theusers; and generating a random event vector according to a capacitymodel of the cellular network, a capacity model of the drone-cellnetworks, the accessible network sets of the users and the transmissionrate model.

More specifically, this embodiment assumes that the proposed networkselection is a repeated stochastic game problem. FIG. 3 is a schematicdiagram of a repeated stochastic game structure obeyed by the networkselection method according to the embodiment shown in FIG. 2. As shownin FIG. 3, this structure includes an environment 201, a game manager202 and players 203. This embodiment regards N users as players 203 andnetwork selection strategies taken by the users as actions. At each timeslot t∈{0, 1, 2, . . . ,}, each player 203-i∈

can observe a random event ω_(i) (t)∈Ω_(i) from the environment 201 andthe game manager 202 can observe the whole random event vectorω(t)=(ω₀(t), ω₁(t), . . . , ω_(N)(t))∈Ω from the environment 201, whereω₀(t)∈Ω₀ represents the random event known only by the game manager 202and Ω=Ω₀×Ω₁× . . . ×Ω_(N). Specifically, in this embodiment, the randomevent ω₀(t) known only by the game manager 202 includes the capacity ofthe cellular network and the capacity of the drone-cell networks, i.e.,ω₀ (t)=(C₁(t), C₂(t), . . . , C_(M)(t)). For all i∈

, set ω_(i)(t)={

(t),R_(i)(t)}, where

(t) represents the set of accessible non-empty networks of the player203-i, i.e.,

(t)={j∈

|r_(ij)(t)≤R_(d)}∪{1}.

S103: generating an action vector according to the random event vector.

More specifically, the action vector is used for indicating that theusers choose to access to the drone-cell networks and/or the cellularnetwork. After observing the random event vector ω(t) at time slot t,the game manager 202 sends a suggestion s_(i)(t)∈

(ω_(i)(t)) to each player 203-i, where

(ω_(i)(t))=

(t)∪{0} is the finite set of actions available to the player 203-i. Forexample, s_(i)(t)=j indicates that the game manager 202 suggests theplayer 203-i choose the network j. Besides, for convenience ofdescription, this embodiment simplifies

(ω_(i)(t)) as

(t).

For each player 203-i∈

, it will choose an action α_(i)(t)∈

(t) based on the suggestion s_(i)(t). For example, α_(i)(t)=j indicatesthat the player 203-i chooses to access to the network j. Thisembodiment lets s(t)=(s₁(t), s₂(t), . . . , s_(N)(t)) and α(t)=(α₁(t),α₂(t), . . . , α_(N)(t)) denote the suggestion vector and the actionvector, respectively, and defines

(t)=

₁(t)× . . . ×

(t).

S104: obtaining an individual utility of each user according to theaction vector and the random event vector.

The random event vector ω(t) and the action vector α(t) at time slot tdetermine the individual utility u_(i)(t) of each player 203-i.Formally, the individual utility u_(i)(t) can take the followingformula:

u _(i)(t)=û _(i)(α(t),ω(t))  (3)

More explicitly, this embodiment utilizes the following expression todefine u_(i)(t).

Definition 1: for all i∈

, the individual utility u_(i)(t) can be defined as:

$\begin{matrix}{{u_{i}(t)} = {{{\hat{u}}_{i}( {{\alpha (t)},{\omega (t)}} )} = \{ \begin{matrix}{0,} & {{\alpha_{i}(t)} = 0} \\{{{R_{i}(t)}{f( {\eta_{i}(t)} )}},} & {otherwise}\end{matrix} }} & (4)\end{matrix}$

where ƒ(x) is an effective transmission ratio function defined as:

$\begin{matrix}{{f(x)} = \{ \begin{matrix}{1,} & {0 \leq x \leq x_{b}} \\{{\frac{1}{( {1 - x_{b}} )^{2}}( {1 - x} )^{2}},} & {x_{b} < x \leq 1} \\{0,} & {otherwise}\end{matrix} } & (5)\end{matrix}$

where x_(b) is a constant representing the network busyness ratiothreshold. Further,

${\eta_{i}(t)} = {\frac{\sum\limits_{k \in \mathcal{I}}^{\;}{1\{ {{\alpha_{k}(t)} = {\alpha_{i}(t)}} \} {R_{k}(t)}}}{C_{\alpha_{i}{(t)}}} \geq 0}$

with 1{α_(k)(t)=α_(i) (t)} denoting a 0-1 indicator function. Theindicator function equals one, if α_(k)(t)=α_(i)(t); otherwise, itequals zero. C_(α) _(i) _((t)) represents the capacity of the networkcorresponding to α_(i)(t).

This embodiment assumes that the upper bound of the required datatransmission rate R_(i)(t) for each player 203-i is u_(i) ^(max). Then,according to Definition 1, û_(i)(α(t),ω(t)) satisfies the followingcondition:

0≤û _(i)(α(t),ω(t))≤u _(i) ^(max)  (6)

S105: constructing a first selection model.

More specifically, the probability density function (denoted by π[ω]) ofthe random event vector ω(t) can be defined as the following formula:

π[ω]

Pr[ω(t)=ω], ∀ω∈Ω  (7)

where the notation “

” means “defined as equal to”.

This embodiment further defines the action probability Pr[α|ω] as aconditional probability density function based on α∈

and ω∈Ω, where

=

₁× . . . ×

_(N) and

i = ⋃ ω i ∈ Ω i  i  ( ω i ) =  .

According to the probability theory, the action probability satisfiesthe first action probability constraint:

$\begin{matrix}{{{\Pr \lbrack \alpha \middle| \omega \rbrack} \geq 0},{\forall{\alpha \in}},{\omega \in \Omega}} & (8) \\{{{\Pr \lbrack \alpha \middle| \omega \rbrack} = 0},{\forall{\alpha \notin {(\omega)}}},{\omega \in \Omega}} & (9) \\{{{{\Pr \lbrack \alpha \middle| \omega \rbrack}} = 1},{\forall{\omega \in \Omega}}} & (10)\end{matrix}$

where

=

₁× . . . ×

_(N), and

_(i)(ω_(i)) represents the finite set of actions available to the player203-i after the random event ω_(i) is observed.

With the definition of Pr[α|ω], this embodiment defines a variable ū_(i)denoting the time average of the individual utility u_(i)(t). Accordingto the law of large numbers, if the action vector α(t) is chosenindependently at each time slot t according to the same conditionalprobability density function Pr[α|ω], it can be guaranteed that for alli∈

, ū_(i), can be written as the following form with probability 1 (w.p.1):

$\begin{matrix}{u_{i} = {\sum\limits_{\omega \in \Omega}^{\;}{{\pi \lbrack\omega\rbrack}{\Pr \lbrack \alpha \middle| \omega \rbrack}{{\hat{u}}_{i}( {\alpha,\omega} )}}}} & (11)\end{matrix}$

In addition, considering that the game manager 202 has an objective offormulating the Pr[α|ω] to maximize the total user utility whileensuring the fairness among users, this embodiment designs an increasingand concave proportional fairness function ϕ(ū₁, ū₂, . . . , ū_(N)) forthe game manager 202 as the first objective function of the firstselection model. Explicitly, this embodiment supposes that theproportional fairness function ϕ(ū₁, ū₂, . . . , ū_(N)) is a sum oflogarithmic functions:

ϕ(ū ₁ ,ū ₂ , . . . ,ū _(N))=

log₂(ū _(i))  (12)

Each player 203-i, however, is interested in maximizing its own timeaverage utility ū_(i). Thus, players 203 may choose whether to acceptthe suggestions provided by the game manager 202. For each player 203-i,there are two types of selections as presented below.

Participate:

if a player 203-i always chooses to accept the suggestion s_(i)(t) ateach time slot t∈{0, 1, 2, . . . ,}, it is called as participate. Thatis, α_(i) (t)=s_(i)(t) for all t∈{0, 1, 2, . . . }.

Non-Participate:

if a player 203-i chooses the action α_(i)(t) according to its observedrandom event ω_(i)(t) at each time slot t∈{0, 1, 2, . . . ,}, it iscalled as non-participate.

This embodiment assumes that non-participating players 203 will notreceive the suggestions s_(i)(t).

Definition 2: in order to incentivize all players 203 to participate,the Pr[α|ω] formulated by the game manager 202 needs to be a coarsecorrelated equilibrium (CCE), which is defined as follows:

For a stochastic game, Pr[α|ω] is a CCE if there are first auxiliaryvariables φ_(i)(υ_(i))∈[0, u_(i) ^(max)] for all i∈

meeting the following conditions:

 ∑ ω ∈ Ω   π  [ ω ]  Pr  [ α | ω ]  u ^ i  ( α , ω ) ≥ ∑ ω ∈ Ω  π  [ ω ]  Pr  [ α | ω ]  ϕ i  ( ω i ) ( 13 ) ∑ ω ∈ Ω | ω i = υ i  π  [ ω ]  Pr  [ α | ω ]  ϕ i  ( υ i ) ≥ ∑ ω ∈ Ω | ω i = υ i  π  [ ω ]  Pr  [ α | ω ]  u ^ i  ( ( β i , α i _ ) , ω )    ∀ υ i∈ Ω i , β i ∈ i ( 14 )

where, α_(ī)=α\{α_(i)} represents all entries in the action vector αexcept for the α_(i), β_(i)∈

is a preset specific action of a non-participating player 203-i, andυ_(i)∈Ω_(i) is a preset specific event of a non-participating player203-i. Intuitively, φ_(i)(υ_(i)) represents the largest conditionalexpected utility earned by the non-participating player 203-i whenω_(i)=υ_(i) is observed.

According to Definition 2, the total number of the CCE constraints (13)and (14) is N+

|Ω_(i)∥

|, which is a linear function over the size of sets Ω_(i) and

, where |⋅| represents the number of elements in a set. The value of|Ω_(i)∥

|, however, is too large in the system model of this embodiment; thus,the computation of a CCE is complex. This embodiment next describes howto reduce the value of |Ω_(i)∥

|.

First, this embodiment simplifies the value space Ω_(i) of the presetevent υ_(i). From the description of the network model in thisembodiment, all drone-cells are homogenous. Therefore, for the presetevent υ_(i), this embodiment only considers the number of the accessibledrone-cell networks of a user i∈

, rather than the difference in their indexes. Furthermore, theprobability that a user i is covered by more than two drone-cellnetworks simultaneously is small. Thus, the set of

(t) can be simplified as {0, 1, 2+}, where, “0”, “1”, and “2+” indicatethat the number of the drone-cell networks covering a user i is zero,one and not less than two, respectively. Besides, for preset eventυ_(i), this embodiment divides the interval [0, u_(i) ^(max)] into K_(R)segments uniformly. If

${{R_{i}(t)} \in \lbrack {\frac{( {i_{k} - 1} )u_{i}^{{ma}\; x}}{K_{R}},\frac{i_{k}u_{i}^{{ma}\; x}}{K_{R}}} )},$

where i_(k)=1, 2, . . . , K_(R), then R_(i)(t) belongs to the i_(k)-thsegment. To sum up, this embodiment can simplify the value space Ω_(i)of the preset event υ_(i) as Ω_(i) ^(s)={0, 1, 2+}×{1, 2, . . . ,K_(R)}. Note that the value space of ω_(i) is still Ω_(i). For allω_(i)∈Ω_(i) and υ_(i)∈Ω_(i) ^(s), ω_(i)=υ_(i) in the constraint (14)indicates that ω_(i) is the original form of υ_(i), and in theconstraint (13), φ_(i)(ω_(i))=φ_(i)(υ_(i)|υ_(i)=ω_(i)) with υ_(i)=ω_(i)indicating that υ_(i) is the simplified form of ω_(i).

Second, this embodiment simplifies the value space

of the preset action β_(i). Owing to the homogeneity of drone-cells,this embodiment does not identify the index of the drone-cell networkthat a user i chooses to access for the preset action β_(i). Meanwhile,since the individual utility u_(i)(t) of a user i at time slot t equalszero when the user i accesses to the empty network according toDefinition 1, this embodiment does not consider the preset action ofaccessing to the empty network, i.e., β_(i)≠0. Therefore, thisembodiment simplifies the value space

of the preset action β_(i) as

={cellular, drone-cell}, where “cellular” and “drone-cell” indicate thatthe user i chooses to access to a cellular network and a drone-cellnetwork, respectively. Besides, when choosing to access to a drone-cellnetwork, the user i accesses randomly to an accessible drone-cellnetwork with equal probability. Note that the value space of α_(i) isstill

.

Therefore, the value of |Ω_(i)∥

| is reduced to |Ω_(i) ^(s)∥

|=(|{0,1,2+}|×K_(R))×|

|=6K_(R). Meanwhile, the preset action β_(i)=drone-cell is infeasiblewhen there is no accessible drone-cell networks for a user i. Thus, thisembodiment can ignore the following preset event-preset action pairs:{(0, i_(k);drone-cell), ∀i_(k)∈{1, 2, . . . , K_(R)} }. In this way, thevalue of |Ω_(i) ^(s)∥

| is further reduced to 6K_(R)−K_(R)=5K_(R). Finally, the total numberof the CCE constraints (13) and (14) is reduced to N+

|Ω_(i) ^(s)∥

|=(5K_(R)+1)N.

In actual scenarios, some users have a requirement of minimum timeaverage utilities (MTAU), and this embodiment denotes the set of theseusers as S_(u). Therefore, the game manager 202 must guarantee theutilities of these users meet the following MTAU constraints:

ū _(i) ≥u _(i) ^(c) , ∀i∈S _(u)  (15)

Based on the above analysis, the constructed first selection modelincludes: a first objective function and a first constraint. The firstconstraint includes: a first coarse correlated equilibrium constraint, afirst minimum individual time average utility constraint and a firstaction probability constraint. The constructed first selection model isas follows:

 Maximize  Pr  [ α | ω ]  ϕ i  ( υ i )   φ  ( u _ 1 , …  , u _N )    subject   to  :    u _ i = ∑ ω ∈ Ω   π  [ ω ]  Pr [ α | ω ]  u ^ i  ( α , ω ) , ∀ i ∈ ℐ    ∑ ω ∈ Ω   π  [ ω ]  Pr [ α | ω ]  u ^ i  ( α , ω ) ≥ ∑ ω ∈ Ω   π  [ ω ]  Pr  [ α | ω ] ϕ i  ( ω i ) ,   ∀ i ∈ ℐ   ∑ ω ∈ Ω | ω i = υ i   π  [ ω ]  Pr [ α | ω ]  ϕ i  ( υ i ) ≥ ∑ ω ∈ Ω | ω i = υ i   π  [ ω ]  Pr  [α | ω ]  u ^ i  ( ( β i , α i _ ) , ω )    ∀ i ∈ ℐ , υ i ∈ Ω i s ,β i ∈ i s    u _ i ≥ u i c , ∀ i ∈  u    Pr  [ α | ω ] ≥ 0 , ∀ α∈ , ω ∈ Ω    Pr  [ α | ω ] = 0 , ∀ α ∉  ( ω ) , ω ∈ Ω     Pr [ α | ω ] = 1 , ∀ ω ∈ Ω ( 16 )

S106: obtaining a value of an action probability according the firstselection model, and determining networks that the users choose toaccess according to the value of the action probability.

The objective of the game manager 202 is to solve the first selectionmodel to obtain the action probability Pr[α|ω], and to choose suggestionvector s(t)=α(t) according to Pr[α|ω] to determine the networks that theusers choose to access according to the suggestion vector.

The network selection method provided by this embodiment includes:acquiring the dynamic network model and the dynamic user model,generating the random event vector according to the dynamic networkmodel and the dynamic user model, constructing the first selection modelaccording to the random event vector and the action vector, obtainingthe value of the action probability according to the first selectionmodel to determine the networks that the users choose to accessaccording to the value of the action probability. This embodiment solvesthe problem that the existing game-theory-based approaches cannot solvethe network selection problem for the integrated cellular and drone-cellnetworks because the integrated cellular and drone-cell networks arehighly dynamic, and the network state is hard to predict.

Although the above problem (16) is a convex optimization problem, it ishighly challenging to solve it owing to the following two reasons: 1)π[ω] is essential to solve this problem (16); however, it may beimpossible to obtain π[ω] because π[ω] is influenced by various factorssuch as network capacity, the mobility of drones and users, and usertraffic. 2) The size of the variable Pr[α|ω] is

∏ i ∈ ℐ   i   ∏ i ∈ ℐ ⋃ { 0 }   Ω i  ,

which increases exponentially with the increasing number of users. Tosolve these issues, this embodiment converts this challenging probleminto a new problem without knowing π[ω], and the size of the new problemis greatly reduced.

This embodiment illustrates a network selection method according toanother exemplary embodiment, which differs from the embodimentillustrated in FIG. 2 in that, after S104, it includes:

S1051: constructing a second selection model.

More specifically, for a real-valued stochastic process u(t) over timeslots t∈{0, 1, 2, . . . }, this embodiment defines its time averageexpectation over the first t time slots as:

$\begin{matrix}{{\overset{\_}{u}(t)} = {\frac{1}{t}{\sum\limits_{\tau = 0}^{t - 1}{E\lbrack {u(\tau)} \rbrack}}}} & (17)\end{matrix}$

For all i∈

, υ_(i)∈Ω_(i) ^(s) and β_(i)∈

, this embodiment defines:

u _(i,υ) _(i) ^((β) ^(i) ⁾(t)

û _(i)((β_(i),α_(ī)(t)),ω(t))1{ω_(i)(t)=υ_(i)}  (18)

By the stochastic process theory, this embodiment converts the firstselection model (16) equivalently into the second selection model, wherethe second selection model includes a second objective function and asecond constraint, where the second objective function is a proportionalfairness function with the time average expectation of the individualutilities as independent variables, and the second constraint includesat least second coarse correlated equilibrium constraints, a secondminimum individual time average utility constraint and a secondauxiliary variable constraint, where the second coarse correlatedequilibrium constraints are used for constraining the time averageexpectation of the individual utilities and the time average expectationof second auxiliary variables, the second minimum individual timeaverage utility constraint is used for constraining the time averageexpectation of the individual utilities, and the second auxiliaryvariable constraint is used for constraining the second auxiliaryvariables. At each time slot t∈{0, 1, 2, . . . }, the game manager 202observes the random event vector ω(t)∈Ω, and solves an action vectorα(t)∈

(t) and variables φ_(i,υ) _(i) (t), where φ_(i,υ) _(i) (t) represent thesecond auxiliary variable at time slot t.

Maximize  α  ( t ) , ϕ i , υ i   lim   inf t -> ∞   φ  ( u 1 ( t ) , …  , u _ N  ( t ) )   subject   to  :   lim   inf t-> ∞ [ u _ i  ( t ) - ∑ υ i ∈ Ω i s  ϕ _ i , υ i  ( t ) ] ≥ 0 , ∀ i ∈ℐ   lim   inf t -> ∞  [ ϕ _ i , υ i  ( t ) - u _ i , υ i ( β i ) ( t ) ] ≥ 0 , ∀ i ∈ ℐ , υ i ∈ Ω i s , β i ∈ i s   0 ≤ ϕ i , υ i  ( t) ≤ u i ma   x  1  { ω 1  ( t ) = υ i } , ∀ t ∈ { 0 , 1 , 2 , …  }, i ∈ ℐ ,  υ i ∈ Ω i s   lim   inf t -> ∞  [ u _ i  ( t ) - u i c ( t ) ] ≥ 0 , ∀ i ∈  u   α  ( t ) ∈  ( t ) , ∀ t ∈ { 0 , 1 , 2 ,…  } ( 19 )

S106: obtaining a value of the action vector according to the secondselection model, and determining the networks that the users choose toaccess according to the value of the action vector.

In the network selection method provided in this embodiment, theconstructed second selection model is based on the time averageexpectation of the individual utilities, the action vector can beobtained when the probability of the random event vector is unknown, andthe networks that the users choose to access are determined according tothe value of the action vector.

This embodiment illustrates a network selection method according toanother exemplary embodiment, which differs from the previous embodimentin that, after S1051, it further includes:

S1052: constructing a third selection model.

More specifically, the objective of the above problem (19) is tomaximize a nonlinear function of time average. In order to convert itequivalently into a maximization of the time average of a nonlinearfunction, this embodiment introduces a third auxiliary vectorγ(t)=(γ₁(t), . . . , γ_(N)(t)) with 0≤γ_(i)(t)≤u_(i) ^(max) for all i∈

, and defines g(t)=ϕ(γ₁(t), . . . , γ_(N)(t). Then, the Jensen'sinequality indicates that

g (t)≤ϕ(γ ₁(t), . . . ,γ _(N)(t))  (20)

This embodiment considers converting the second selection model (19)into a third selection model (21.1) by leveraging the Jensen'sinequality. Where the third selection model includes a third objectivefunction and a third constraint, and the third objective function (21.1)is time average expectation of a proportional fairness function with thethird auxiliary variables as independent variables, and the thirdconstraint includes at least the second coarse correlated equilibriumconstraints (21.4) and (21.5), the second minimum individual timeaverage utility constraint (21.7), the second auxiliary variableconstraint (21.6) and third auxiliary variable constraints (21.2) and(21.3). At each time slot t∈{0, 1, 2, . . . }, the game manager 202observes the random event vector ω(t)∈Ω and chooses an action vectorα(t)∈

(t), variables φ_(i,υ) _(i) (t) and an auxiliary vector γ(t) to solve:

Maximize  α  ( t ) , ϕ i , υ i  ( t ) , γ  ( t )   lim   inf t-> ∞  g _  ( t )   subject   to  : ( 21.1 ) lim t -> ∞   γ _ i ( t ) - u _ i  ( t )  = 0 , ∀ i ∈ ℐ ( 21.2 ) 0 ≤ γ i  ( t ) ≤ u ima   x , ∀ t ∈ { 0 , 1 , 2 , …  } , i ∈ ℐ ( 21.3 ) lim   inf t -> ∞[ u _ i  ( t ) - ∑ υ i ∈ Ω i s  ϕ _ i , υ i  ( t ) ] ≥ 0 , ∀ i ∈ ℐ (21.4 ) lim   inf t -> ∞  [ ϕ _ i , υ i  ( t ) - u _ i , υ i ( β i ) ( t ) ] ≥ 0 , ∀ i ∈ ℐ , υ i ∈ Ω i s , β i ∈ i s ( 21.5 ) 0 ≤ ϕ i , υ i ( t ) ≤ u i ma   x  1  { ω 1  ( t ) = υ i } , ∀ t ∈ { 0 , 1 , 2 ,…  } , i ∈ ℐ ,  υ i ∈ Ω i s ( 21.6 ) lim   inf t -> ∞  [ u _ i  (t ) - u i c ] ≥ 0 , ∀ i ∈  u ( 21.7 ) α  ( t ) ∈  ( t ) , ∀ t ∈ { 0 ,1 , 2 , …  } ( 21.8 )

where, the second coarse correlated equilibrium constraints (21.4) and(21.5) are used to constrain the time average expectation of theindividual utilities and the time average expectation of the secondauxiliary variables, the second minimum individual time average utilityconstraint (21.7) is used to constrain the time average expectation ofthe individual utilities, the second auxiliary variable constraint(21.6) is used to constrain the second auxiliary variables, and thethird auxiliary variable constraints (21.2) and (21.3) are used toconstrain the time average expectation of the third auxiliary variablesand the time average expectation of the individual utilities.

S106: obtaining the value of the action vector according to the thirdselection model, and determining the networks that the users choose toaccess according to the value of the action vector.

In the network selection method provided in this embodiment, the timeaverage expectation of the proportional fairness function with the thirdauxiliary variables as independent variables is taken as the thirdobjective function, so that the objective function can be simplified toobtain the action vector according to the third selection model.

This embodiment illustrates a network selection method according toanother exemplary embodiment, which differs from the previous embodimentin that, after S1052, it includes:

S1053: converting the third selection model into a fourth selectionmodel by leveraging a drift-plus-penalty technique.

Considering the principle of the drift-plus-penalty technique, as forthe constraint (21.1), this embodiment defines the first term Q_(i)(t)of the first virtual queue for all i∈

as the following form:

$\begin{matrix}{{Q_{i}( {t + 1} )} = {{Q_{i}(t)} + {\sum\limits_{\upsilon_{l} \in \Omega_{l}^{s}}^{\;}{\phi_{i,\upsilon_{i}}(t)}} - {u_{i}(t)}}} & (22)\end{matrix}$

The constraint (21.4) can be satisfied, if the following mean-ratestability condition is held:

$\begin{matrix}{{\lim\limits_{tarrow\infty}\frac{E\lbrack \lbrack {Q_{i}(t)} \rbrack^{+} \rbrack}{t}} = 0} & (23)\end{matrix}$

where, the nonnegative operation [x]⁺=max{x,0}.

Likewise, to enforce the constraints (21.1), (21.1) and (21.1), thisembodiment defines other three types of virtual queues, respectively.The second item D_(i,υ) _(i) ^((β) ^(i) ⁾(t) of the first virtual queueis defined for all i∈

, υ_(i)∈Ω_(i) ^(s), and β_(i)∈

:

D _(i,υ) _(i) ^((β) ^(i) ⁾(t+1)=D _(i,υ) _(i) ^((β) ^(i) ⁾(t)+u _(i,υ)_(i) ^((β) ^(i) ⁾(t)−φ_(i,υ) _(i) (t)  (24)

The second virtual queue Z_(i)(t) is defined for all i∈

:

Z _(i)(t+1)=Z _(i)(t)+γ_(i)(t)−u _(i)(t)  (25)

The third virtual queue H_(i)(t) is defined for all i∈S_(u):

H _(i)(t+1)=H _(i) +u _(i) ^(c) −u _(i)(t)  (26)

And the constraints (21.1), (21.1) and (21.1) can be satisfied, if thefollowing mean-rate stability conditions are held:

$\begin{matrix}{{\lim\limits_{tarrow\infty}\frac{E\lbrack \lbrack {D_{i,\upsilon_{i}}^{(\beta_{i})}(t)} \rbrack^{+} \rbrack}{t}} = 0} & (27) \\{{\lim\limits_{tarrow\infty}\frac{E\lbrack {Z_{i}(t)} \rbrack}{t}} = 0} & (28) \\{{\lim\limits_{tarrow\infty}\frac{E\lbrack \lbrack {H_{i}(t)} \rbrack^{+} \rbrack}{t}} = 0} & (29)\end{matrix}$

For simplicity, this embodiment assumes that all virtual queues areinitialized to zero.

According the formulas (22), (24), (25) and (26), the first virtualvalue in the first virtual queue at the current time slot is generatedaccording to the violation of the second coarse correlated equilibriumconstraints at the previous time slot and the first virtual value in thefirst virtual queue at the previous time slot. The second virtual valuein the second virtual queue at the current time slot is generatedaccording to the violation of the third auxiliary variable constraintsat the previous time slot and the second virtual value in the secondvirtual queue at the previous time slot. The third virtual value in thethird virtual queue at the current time slot is generated according tothe violation of the second minimum individual time average utilityconstraint at the previous time slot and the third virtual value in thethird virtual queue at the previous time slot.

This embodiment defines a function L(t) as a sum of squares of all fourtypes of queues [Q_(i)(t)]⁺, [D_(i,υ) _(i) ^((β) ^(i) ⁾(t)]⁺, Z_(i)(t)and [H_(i)(t)]⁺ (divided by 2 for convenience) at time slot t, which iscalled a Lyapunov function as the total violation:

$\begin{matrix}{{L(t)}\overset{\bigtriangleup}{=}{{\frac{1}{2}{\sum\limits_{i \in \mathcal{I}}^{\;}( \lbrack {Q_{i}(t)} \rbrack^{+} )^{2}}} + {\frac{1}{2}{\sum\limits_{i \in \mathcal{I}}^{\;}{\sum\limits_{{\upsilon_{i} \in \Omega_{i}^{s}},{\beta_{i} \in A_{i}^{s}}}^{\;}( \lbrack {D_{i,\upsilon_{i}}^{(\beta_{i})}(t)} \rbrack^{+} )^{2}}}} + {\frac{1}{2}{\sum\limits_{i \in \mathcal{I}}^{\;}{Z_{i}(t)}^{2}}} + {\frac{1}{2}{\sum\limits_{i \in \mathcal{I}}^{\;}( \lbrack {H_{i}(t)} \rbrack^{+} )^{2}}}}} & (30)\end{matrix}$

where, H_(i)(t)=0 for all i∉S_(u).

Besides, this embodiment defines a drift-plus-penalty expression asΔ(t)−Vg(t), where Δ(t)=L(t+1)−L(t) represents a Lyapunov drift, that is,the drift value of the total violation, −g(t) is a “penalty”, g(t) isthe proportional fairness function with the third auxiliary variables asindependent variables, and V is a non-negative penalty coefficient thataffects a trade-off between the constraint violation and the optimality.The drift-plus-penalty expression satisfies the following condition:minimizing the constraint violation and maximizing the objective.Therefore, the fourth selection model is constructed as follows:

$\begin{matrix}{{{\Delta (t)} - {{Vg}(t)}} \leq {B + {\sum\limits_{l \in \mathcal{I}}^{\;}{\lbrack {H_{i}(t)} \rbrack^{+}u_{l}^{c}}} - {V\; {\varphi ( {{\gamma_{1}(t)},\ldots \mspace{14mu},{\gamma_{N}(t)}} )}} + {\sum\limits_{i \in \mathcal{I}}^{\;}{{Z_{i}(t)}{\gamma_{i}(t)}}} + {\sum\limits_{i \in \mathcal{I}}^{\;}\{ {\lbrack {Q_{i}(t)} \rbrack^{+}{\sum\limits_{\upsilon_{i} \in \Omega_{i}^{s}}^{\;}{\phi_{i,\upsilon_{i}}(t)}}} \}} - {\sum\limits_{i \in \mathcal{I}}^{\;}{\sum\limits_{{\upsilon_{i} \in \Omega_{i}^{s}},{\beta_{i} \in A_{i}^{s}}}^{\;}{\lbrack {D_{i,\upsilon_{i}}^{(\beta_{i})}(t)} \rbrack^{+}{\phi_{i,\upsilon_{i}}(t)}}}} + {\sum\limits_{i \in \mathcal{I}}^{\;}{\sum\limits_{{\upsilon_{i} \in \Omega_{i}^{s}},{\beta_{i} \in A_{i}^{s}}}^{\;}{\lbrack {D_{l,\upsilon_{i}}^{(\beta_{l})}(t)} \rbrack^{+}{u_{i,\upsilon_{i}}^{(\beta_{i})}(t)}}}} - {\sum\limits_{i \in \mathcal{I}}^{\;}{\{ {\lbrack {Q_{i}(t)} \rbrack^{+} + {Z_{i}(t)} + \lbrack {H_{i}(t)} \rbrack^{+}} \} {u_{i}(t)}}}}} & (31)\end{matrix}$

where a penalty upper bound includes: a constant term, a first penaltyupper bound term, a second penalty upper bound term and a third penaltyupper bound term. The constant term is

${B\overset{\bigtriangleup}{=}{{\sum\limits_{l \in \mathcal{I}}^{\;}( u_{i}^{\max} )^{2}} + {\frac{1}{2}{\sum\limits_{i \in S_{u}}^{\;}( u_{i}^{\max} )^{2}}} + {\frac{1}{2}{\sum\limits_{i \in \mathcal{I}}^{\;}{{A_{i}^{s}}( u_{i}^{\max} )^{2}}}}}},$

the first penalty upper bound term is

${{{- V}\; {\varphi ( {{\gamma_{1}(t)},\ldots \;,{\gamma_{N}(t)}} )}} + {\sum\limits_{i \in \mathcal{I}}{{Z_{i}(t)}{\gamma_{i}(t)}}}},$

, the second penalty upper bound term is

${\sum\limits_{l \in \mathcal{I}}^{\;}\{ {\lbrack {Q_{i}(t)} \rbrack^{+}{\sum\limits_{\upsilon_{i} \in \Omega_{i}^{s}}^{\;}{\phi_{i,\upsilon_{i}}(t)}}} \}} - {\sum\limits_{l \in \mathcal{I}}^{\;}{\sum\limits_{{\upsilon_{i} \in \Omega_{i}^{s}},{\beta_{i} \in A_{i}^{s}}}^{\;}{\lbrack {D_{i,\upsilon_{i}}^{(\beta_{i})}(t)} \rbrack^{+}{\phi_{i,\upsilon_{i}}(t)}}}}$

and the third penalty upper bound term is

$\sum\limits_{l \in \mathcal{I}}^{\;}{\sum\limits_{{\upsilon_{i} \in \Omega_{i}^{s}},{\beta_{i} \in A_{i}^{s}}}^{\;}{\lbrack {D_{i,\upsilon_{i}}^{(\beta_{i})}(t)} \rbrack^{+}{u_{i,\upsilon_{i}}^{(\beta_{i})}(t)}{\sum\limits_{i \in \mathcal{I}}^{\;}{\{ {\lbrack {Q_{i}(t)} \rbrack^{+} + {Z_{i}(t)} + \lbrack {H_{i}(t)} \rbrack^{+}} \} {{u_{i}(t)}.}}}}}$

S106: obtaining the value of the action vector according the fourthselection model, and determining the networks that the users choose toaccess according to the value of the action vector.

In the network selection method provided in this embodiment, the fourthselection model is an inequality form, and this model is simple and easyto obtain the action vector according to the fourth selection model.

This embodiment illustrates a network selection method according toanother exemplary embodiment, which differs from the previous embodimentin that: S106: obtaining the value of the action vector according thefourth selection model, and determining the networks that the userschoose to access according to the value of the action vector,specifically including the following steps:

S1061: at each time slot t, the game manager 202 observes the first termQ_(i)(t) of the first virtual value in the first virtual queue at thecurrent time slot, the second term D_(i,υ) _(i) ^((β) ^(i) ⁾(t) of thefirst virtual value in the first virtual queue at the current time slot,the second virtual value Z_(i)(t) in the second virtual queue at thecurrent time slot, the third virtual value H_(i)(t) in the third virtualqueue at the current time slot and the random event vector ω(t)∈Ω.

This embodiment solves the problem (21.1) by minimizing the upper boundof Δ(t)−Vg(t) at each time slot t greedily. Meanwhile, the upper boundof Δ(t)−Vg(t) can be decomposed into four independent terms. At eachtime slot t, the first term is a constant, the second term is a functionof the third auxiliary vector γ(t), the third term is a function of thesecond auxiliary variables φ_(i,υ) _(i) (t), and the forth term is afunction of the individual utilities u_(i)(t) and u_(i,υ) _(i) ^((β)^(i) ⁾(t).

S1062: choosing the value of the third virtual variables γ_(i)(t) forall i∈

according to the second virtual value at the current time slot and thefirst penalty upper bound term to solve:

$\begin{matrix}{{{Minimize} - {{V\varphi}( {{\gamma_{1}(t)},\ldots \mspace{14mu},{\gamma_{N}(t)}} )} + {\sum\limits_{l \in \mathcal{I}}^{\;}{{Z_{i}(t)}{\gamma_{i}(t)}}}}{{{{subject}\mspace{14mu} {to}\text{:}\mspace{14mu} 0} \leq {\gamma_{i}(t)} \leq u_{i}^{\max}},{\forall{i \in \mathcal{I}}}}} & (32)\end{matrix}$

The closed-form solution of the problem (32) can take the following formfor all i∈

:

$\begin{matrix}{{\gamma_{l}(t)} = \{ \begin{matrix}{u_{i}^{\max},} & {{Z_{i}(t)} \leq 0} \\{{\min \mspace{11mu} \{ {\frac{V}{{Z_{i}(t)}\ln \mspace{11mu} 2},u_{i}^{\max}} \}},} & {{Z_{i}(t)} > 0}\end{matrix} } & (33)\end{matrix}$

S1063: choosing the value of the second virtual variables φ_(i,υ) _(i)(t) for all i∈

and υ_(i) ∈Ω_(i) ^(s) according to the random event vector, the firstvirtual value at the current time slot and the second penalty upperbound term to solve:

$\begin{matrix}{{{{Minimize}\mspace{14mu} {\sum\limits_{i \in \mathcal{I}}^{\;}\{ {\lbrack {Q_{i}(t)} \rbrack^{+}{\sum\limits_{\upsilon_{i} \in \Omega_{i}^{s}}^{\;}{\phi_{i,\upsilon_{i}}(t)}}} \}}} - {\sum\limits_{l \in \mathcal{I}}^{\;}{\sum\limits_{{\upsilon_{i} \in \Omega_{i}^{s}},{\beta_{i} \in A_{i}^{s}}}^{\;}{\lbrack {D_{l,\upsilon_{i}}^{(\beta_{i})}(t)} \rbrack^{+}{\phi_{i,\upsilon_{i}}(t)}}}}}{{{{subject}\mspace{14mu} {to}\text{:}\mspace{14mu} 0} \leq {\phi_{i,\upsilon_{i}}(t)} \leq {u_{i}^{\max}1\{ {{\omega_{i}(t)} = \upsilon_{i}} \} {\forall{i \in \mathcal{I}}}}},{\upsilon_{i} \in \Omega_{i}^{s}}}} & (34)\end{matrix}$

The closed-form solution of the problem (34) can take the following formfor all i∈

and υ_(i) ∈Ω_(i) ^(s):

$\begin{matrix}{{\phi_{i,\upsilon_{i}}(t)} = \{ \begin{matrix}{u_{i}^{\max},} & {{{\omega_{i}(t)} = \upsilon_{i}},{\lbrack {Q_{i}(t)} \rbrack^{+} < {\sum\limits_{\beta_{i} \in A_{i}^{s}}^{\;}\lbrack {D_{i,\upsilon_{i}}^{(\beta_{i})}(t)} \rbrack^{+}}}} \\{0,} & {otherwise}\end{matrix} } & (35)\end{matrix}$

S1064: choosing the value of the action vector α(t) according to therandom event vector, the first virtual value at the current time slot,the second virtual value at the current time slot, the third virtualvalue at the current time slot and the third penalty upper bound term tosolve:

$\begin{matrix}{{{Maximize} - {\sum\limits_{l \in \mathcal{I}}^{\;}{\sum\limits_{{\upsilon_{i} \in \Omega_{i}^{s}},{\beta_{i} \in A_{i}^{s}}}^{\;}{\lbrack {D_{i,\upsilon_{i}}^{(\beta_{i})}(t)} \rbrack^{+}{u_{i,\upsilon_{i}}^{(\beta_{i})}(t)}}}} + {\sum\limits_{l \in \mathcal{I}}^{\;}{\{ {\lbrack {Q_{i}(t)} \rbrack^{+} + {Z_{i}(t)} + \lbrack {H_{i}(t)} \rbrack^{+}} \} {u_{i}(t)}}}}{{{subject}\mspace{14mu} {to}\text{:}\mspace{14mu} {\alpha (t)}} \in {A(t)}}} & (36)\end{matrix}$

S1065: sending α_(i)(t) to each player 203-i, so that the player 203-idetermines the network to choose to access according to the actionα_(i)(t).

The individual utilities u_(i)(t) and u_(i,υ) _(i) ^((β) ^(i) ⁾(t) arecomputed according to formulas (14) and (18), respectively. The virtualqueues Q_(i)(t), D_(i,υ) _(i) ^((β) ^(i) ⁾(t), Z_(i)(t) and H_(i)(t) areupdated according to formulas (22), (24), (25) and (26), respectively.

The problem (36) is a non-linear integer programming problem, whereu_(i)(t) and u_(i,υ) _(i) ^((β) ^(i) ⁾(t) are complicate non-linearfunctions of α(t). The exhaustive algorithm for solving the problem (36)has a complexity of

${O( {\prod\limits_{i \in \mathcal{I}}^{\;}\; {{A_{i}(t)}}} )},$

which increases exponentially with the increasing number of users.Although heuristic algorithms (e.g., genetic algorithm) can be leveragedto mitigate this problem, it may require a long processing time due toits slow convergence rate. To accelerate the optimization process, thisembodiment designs a linear approximation mechanism for the problem(36).

According to Definition 1, the network j∈

\{0} may be congested when the following condition is held:

1{α_(i)(t)=j}R_(ι)(t)>x_(b)C_(j) (t). To avoid this situation, thesuggested action vector α(t) formulated by the game manager 202 shouldsatisfy the first action vector constraint:

$\begin{matrix}{{{\sum\limits_{i \in \mathcal{I}}^{\;}{1\{ {{\alpha_{i}(t)} = j} \} {R_{i}(t)}}} \leq {x_{b}{C_{j}(t)}}},{\forall{j \in {\text{\textbackslash}\{ 0 \}}}}} & (37)\end{matrix}$

For each participating player 203-i∈

, according to Definition 1, if α_(i)(t)=0, u_(i)(t)=0. According toboth the constraint (37) and Definition 1, if α_(i)(t)≠0,u_(i)(t)=R_(i)(t). Therefore, the utility function u_(i)(t) in bothcases can be calculated in the following way, thereby forming a mappingtable between the individual utilities and the transmission rate of theparticipating players:

u _(i)(t)=1{α_(i)(t)≠0}R _(i)(t)  (38)

For each non-participating player 203-i∈

, and for each υ_(i) ∈Ω_(i) ^(s) and β_(i)∈

_(i) ^(s), (β_(i)≠0), this embodiment considers the definition (18) ofu_(i,υ) _(i) ^((β) ^(i) ⁾(t). If υ_(i)≠ω_(i)(t), u_(i,υ) _(i) ^((β) ^(i)⁾(t)=0. If υ_(i)=ω_(i)(t), u_(i,υ) _(i) ^((β) ^(i)⁾(t)=û_(i)((β_(i),α_(ī)(t)), ω(t)). And this embodiment can estimateu_(i,υ) _(i) ^((β) ^(i) ⁾(t) according to the following two differentcases, thereby forming a mapping table between the individual utilitiesand the transmission rate of the non-participating players.

1) If the player 203-i accesses to a network that is just the onesuggested by the game manager 202, i.e., β_(i)=α_(i) (t), then u_(i,υ)_(i) ^((β) ^(i) ⁾(t)=R_(i)(t).

2) If the player 203-i accesses to a network that is not the onesuggested by the game manager 202, i.e., β_(i)≠α_(i)(t), then thisembodiment estimates the effective transmission ratio of the networkj=β_(i) at time slot t (denoted by θ_(i,β) _(i) (t)). Specifically, thisembodiment defines the remaining capacity of the network j∈

\{0} at time slot t as C_(j) ^(r)(t)=x_(b) C_(j)(t)

1{α_(i)(t)=j}R_(i)(t), and assumes that C_(j) ^(r)(t)=C_(j) ^(r)(t−1).As a result, u_(i,υ) _(i) ^((β) ^(i) ⁾(t) can take the followingformula:

where,

$\begin{matrix}{{{u_{i,\upsilon_{i}}^{(\beta_{i})}(t)} = {{\theta_{i,\beta_{i}}(t)}{R_{i}(t)}}}{{{\theta_{i,\beta_{i}}(t)} = {f( \frac{\lbrack {{x_{b}{C_{\beta_{i}}(t)}} - {C_{\beta_{i}}^{r}( {t - 1} )}} \rbrack^{+} + {R_{i}(t)}}{C_{\beta_{i}}(t)} )}},}} & (39)\end{matrix}$

C_(β) _(i) (t) represents the capacity of the network corresponding toβ_(i).

Therefore, u_(i,υ) _(i) ^((β) ^(i) ⁾(t) can be estimated by:

$\begin{matrix}{{u_{o.\upsilon_{i}}^{(\beta_{i})}(t)} = \{ \begin{matrix}{0,} & {\upsilon_{i} \neq {\omega_{i}(t)}} \\{{R_{i}(t)},} & {{\upsilon_{i} = {\omega_{i}(t)}},{\beta_{i} = {\alpha_{i}(t)}}} \\{{{\theta_{i,\beta_{i}}(t)}{R_{i}(t)}},} & {{\upsilon_{i} = {\omega_{i}(t)}},{\beta_{i} \neq {\alpha_{i}(t)}}}\end{matrix} } & (40)\end{matrix}$

Next, this embodiment discusses how to transform the problem (36) intoan integer linear programming problem by introducing a set of auxiliaryvariables a_(ij), where {a_(ij)} is a suggestion matrix. For all i∈

and j∈

, this embodiment defines the mapping relationship between thesuggestion matrix and the action vector as:

$a_{ij} = \{ {\begin{matrix}{1,} & {{\alpha_{i}(t)} = j} \\{0,} & {otherwise}\end{matrix},} $

where a_(ij)=1 indicates that the game manager 202 suggests the player203-i access to the network j, and a_(ij)=0 indicates that the gamemanager 202 suggests the player 203-i not access to the network j.According to the definition of a_(ij) and the constraint α(t)∈

(t), the suggestion matrix constraint can be obtained:

$\begin{matrix}{{{\sum\limits_{{j \in }\;}^{\;}a_{ij}} = 1},{\forall{i \in \mathcal{I}}}} & (41) \\{{a_{ij} \in \{ {0,1} \}},{\forall{i \in \mathcal{I}}}\;,{j \in }} & (42) \\{{a_{ij} = 0},{\forall{i \in \mathcal{I}}},{j \notin {A_{i}\mspace{11mu} (t)}}} & (43)\end{matrix}$

α(t) in (37) is substituted with the variables a_(ij), and the secondaction vector constraint can be obtained:

$\begin{matrix}{{{\sum\limits_{i \in \mathcal{I}}^{\;}{a_{ij}{R_{i}(t)}}} \leq {x_{b}{C_{j}(t)}}},{\forall{j \in {\text{\textbackslash}\{ 0 \}}}}} & (44)\end{matrix}$

α(t) in (37) is substituted with the variables a_(ij), and theindividual utilities of the participating players can be obtained:

$\begin{matrix}{{{u_{i}(t)} = {{R_{i}(t)}{\sum\limits_{j \in {\text{\textbackslash}{\{ 0\}}}}^{\;}a_{ij}}}},{\forall{i \in \mathcal{I}}}} & (45)\end{matrix}$

Furthermore, this embodiment lets

$\theta_{i,j}^{(\beta_{i})} = \{ \begin{matrix}{1,} & {j = \beta_{i}} \\{\theta_{i,\beta_{i}}(t)} & {j \neq \beta_{i}}\end{matrix} $

for all i∈

, j∈

and β_(i)∈

. α(t) in (40) is substituted with the variables a_(ij), and theindividual utilities of the non-participating players can be obtained:

u _(i,υ) _(i) ^((β) ^(i) ⁾(t)

û _(i)((β_(i),α_(ī)(t)),ω(t))1{ω_(i)(t)=υ_(i) }, ∀i∈

,υ _(i)∈Ω_(i) ^(s),β_(i)∈

_(i) ^(s)  (46)

According to (41)-(46), this embodiment can transform the problem (36)into the following integer linear programming problem:

$\begin{matrix}{\underset{\{ a_{ij}\}}{Maximize}\mspace{20mu} {\sum\limits_{i \in \mathcal{I}}^{\;}{\sum\limits_{j \in }^{\;}{c_{ij}a_{ij}}}}} & (47) \\{{subject}\mspace{14mu} {to}\text{:}} & \; \\{{{\sum\limits_{i \in \mathcal{I}}^{\;}{{R_{l}(t)}a_{ij}}} \leq {x_{b}{C_{j}(t)}}},{\forall{j \in {\text{\textbackslash}\{ 0 \}}}}} & \; \\{{{\sum\limits_{j \in }^{\;}a_{ij}} = 1},{\forall{i \in \mathcal{I}}}} & \; \\{{a_{ij} \in \{ {0,1} \}},{\forall{i \in \mathcal{I}}},{j \in }} & \; \\{{a_{ij} = 0},{\forall{i \in \mathcal{I}}},{j \notin {A_{i}(t)}}} & \;\end{matrix}$

where, the weights c_(ij) are defined as:

$\begin{matrix}{c_{ij} = \{ \begin{matrix}{{( {{E_{l}(t)} - {F_{ij}(t)}} ){R_{i}(t)}},} & {\forall{j \in {\text{\textbackslash}\{ 0 \}}}} \\{{{- {F_{ij}(t)}}{R_{i}(t)}},} & {j = 0}\end{matrix} } & (48) \\{where} & \; \\{{E_{i}(t)} = {\lbrack {Q_{i}(t)} \rbrack^{+} + {Z_{i}(t)} + \lbrack {H_{i}(t)} \rbrack^{+}}} & \; \\{and} & \; \\{{F_{ij}(t)} = {\sum\limits_{{\upsilon_{i} \in \Omega_{i}^{s}},{\beta_{i} \in A_{i}^{s}}}^{\;}{\lbrack {D_{i,\upsilon_{i}}^{(\beta_{i})}(t)} \rbrack^{+}\theta_{ij}^{(\beta_{i})}1{\{ {{\omega_{i}(t)} = \upsilon_{i}} \}.}}}} & \; \\{\sum\limits_{i \in \mathcal{I}}^{\;}{\sum\limits_{j \in }^{\;}{c_{ij}a_{ij}}}} & \;\end{matrix}$

is the fourth penalty upper bound term,

${{\sum\limits_{i \in \mathcal{I}}^{\;}{{R_{i}(t)}a_{ij}}} \leq {x_{b}{C_{j}(t)}}},{\forall{j \in {\text{\textbackslash}\{ 0 \}}}}$

constitutes the second action vector constraint, and

$\{ {\begin{matrix}{{{\overset{\;}{\sum_{j \in }}a_{ij}} = 1},{\forall{i \in \mathcal{I}}}} \\{{a_{ij} \in \{ {0,1} \}},{\forall{i \in \mathcal{I}}}\;,{j \in }} \\{{a_{ij} = 0},{\forall{i \in \mathcal{I}}},{j \notin {A_{i}\mspace{11mu} (t)}}}\end{matrix}\mspace{14mu} {constitutes}\mspace{14mu} {the}\mspace{14mu} {suggestion}\mspace{14mu} {matrix}\mspace{14mu} {{constraint}.}} $

Note that at the initial time slot (t=0), all weights c_(ij) will bezero since all virtual queues are initialized to zero. To handle thisproblem, this embodiment defines the weights c_(ij) at time slot t=0 as:

$\begin{matrix}{c_{ij} = \{ \begin{matrix}{{R_{i}(0)},} & {\forall{j \in {\text{\textbackslash}\{ 0 \}}}} \\{0,} & {j = 0}\end{matrix} } & (49)\end{matrix}$

The problem (47) is an integer linear programming problem with respectto the auxiliary variables a_(ij) that can be solved by the MOSEKOptimization Tools. Furthermore, in the MOSEK, the branch-and-boundscheme is leveraged to relax the integer variables; thus, the integerlinear optimization problem is relaxed to a solvable linear optimizationproblem.

This embodiment uses the Lyapunov optimization method and the linearapproximation mechanism to transform the constructed problem withoutknowing the state probability π[ω] of the networks and the users,thereby reducing the computational complexity of the problem greatly.

Based on the main framework of solving the problem (21), and combinedwith the solutions of (32), (34) and (36), this embodiment proposes annetwork selection apparatus for the integrated cellular and drone-cellnetworks based on the efficient and fair network selection (EFNS) methodshown in FIG. 4.

FIG. 4 is a schematic structural diagram of a network selectionapparatus for the integrated cellular and drone-cell networks accordingto an exemplary embodiment of the present disclosure. As shown in FIG.4, a network selection apparatus 300 includes a transceiver 311 and amemory 312. The transceiver 311 is used for collecting the drone-cellnetwork capacity information C_(j)(t), j∈

_(d), the cellular network capacity information C₁(t), the users'accessible network set information

(t), i∈

, and data transmission rate information R_(i)(t), i∈

, to form the random event vector ω(t). Meanwhile, the transceiver 311is also responsible for transmitting the suggested action informationformulated by the game manager 202 to each user device 330. The memory312 can be any type of computer readable medium, and it is responsiblefor storing information such as parameters, state data, action data, andvirtual queue data. The network selection apparatus 300 further includesa processor 313. The processor 313 can be any type of central processingunit, and it is responsible for processing data in the EFNS method.Specifically, the processor 313 obtains the action vector according tothe drone-cell network capacity information, the cellular networkcapacity information, the users' accessible network set information, anddata transmission rate information and the fourth selection model. Eachuser device 330 includes a network access system 331 for controlling theuser device 330 to select a network to access according to the receivedaction. The fourth selection model has been described in detail in theforegoing embodiment, and details are not described herein again.

Referring further to FIG. 4, the network selection apparatus provided bythis embodiment also includes a human-computer interaction module 314,which includes a display and an operator input interface. The displaycan display the results to a computer operator 340. The operator inputinterface can obtain information inputted by the computer operator 340from one or more input devices such as a keyboard and a mouse.

Referring further to FIG. 4, the network selection apparatus provided bythis embodiment further includes networks 320. FIG. 5 is a flowchart ofthe method performed by a processor in the network selection apparatusfor the integrated cellular and drone-cell networks according to theembodiment shown in FIG. 4 of the present disclosure. As shown in FIG.5, the processor 313 performs the following actions:

The state information ω(t) and the virtual queue information Q_(i)(t),D_(i,υ) _(i) ^((β) ^(i) ⁾(t), Z_(i)(t), and H_(i) (t) at the currenttime slot t are received. And the output information is the suggestedaction vector α(t) of the game manager 202 at the current time slot tand the virtual queue information Q_(i)(t+1), D_(i,υ) _(i) ^((β) ^(i)⁾(t+1), Z_(i)(t+1), and H_(i)(t+1) of the next time slot t+1.

S201: obtaining an upper bound u_(i) ^(max) of data transmission rate, asegment value K_(R), a penalty coefficient V, and initializing a firstvirtual queue to a third virtual queue.

More specifically, these parameters are stored in the memory 312. Theseparameters can be given default values, and the computer operator 340can modify these parameters through the human-computer interactionmodule 314. The virtual queues Q_(i)(0)=0, D_(i,υ) _(i) ^((β) ^(i)⁾(0)=0, Z_(i)(0)=0 and H_(i)(0)=0 are initialized, and these virtualqueues are stored in the memory 312.

Steps 2-7 are repeated at each time slot, where T is a total number ofthe time slots.

S202: collecting state information of networks and users to form arandom event vector.

More specifically, the processor 313 collects the state informationω(t)∈Ω of the networks and the users through the transceiver 311.

Specifically, ω(t) will be stored in the memory 312 temporarily untilthe end of Step 6.

S203: obtaining third auxiliary variables γ_(i)(t) according to a secondvirtual value at a current time slot and a first penalty upper boundterm.

More specifically, for each i∈

, the processor 313 calculates the third auxiliary variables γ_(i)(t)according to the formula (33). The third auxiliary variables γ_(i)(t)will be temporarily stored in the memory 312 until the end of Step 6.

S204: obtaining second auxiliary variables according to the random eventvector at the current time slot, a first virtual value at the currenttime slot and a second penalty upper bound term.

More specifically, for each i∈

and υ_(i) ∈Ω_(i) ^(s), the processor 313 calculates the second auxiliaryvariables φ_(i,υ) _(i) (t) by the formula (35). The auxiliary variablesφ_(i,υ) _(i) (t) will be temporarily stored in the memory 312 until theend of Step 6.

S205: obtaining a suggestion matrix according to a fourth penalty upperbound, a second action vector constraint and a suggestion matrixconstraint, and obtaining an action vector according to the suggestionmatrix.

More specifically, the processor 313 obtains the suggestion matrix{a_(ij)} by solving the problem (47), and obtains the action vector α(t)according to the suggestion matrix {a_(ij)}. Then the transceiver 311sends the suggested actions α_(i)(t) to the network access system 331 ofeach user device 330-i. The suggested action vector α(t) will betemporarily stored in the memory 312 until the end of Step 6.

S206: calculating individual utilities and the first virtual queue tothe third virtual queue.

More specifically, the processor 313 calculates u_(i)(t) and u_(i,υ)_(i) ^((β) ^(i) ⁾(t) using formulas (4) and (18), respectively,calculates Q_(i)(t+1) D_(i,υ) _(i) ^((β) ^(i) ⁾(t+1), Z_(i)(t+L) andH_(i)(t+1) using formulas (22), (24), (25) and (26), and updates thevirtual queues in the memory 312.

S207: judging whether a time slot t reaches a preset time slot value, ifnot, turning to S208, otherwise, stopping the loop.

S208: updating the time slot t, and turning to S201.

The following context presents a simulation leveraging the EFNS methodprovided by this embodiment to perform the network selection for theintegrated cellular and drone-cell networks.

In order to verify the effectiveness of the network selection method,this embodiment designs three benchmark comparison methods: thecellular-only (CO) method, the random access (RA) method and theon-the-spot offloading (OTSO) method. For the CO method, at each timeslot, every user always chooses to access to the cellular network. Forthe RA method, at each time slot, every user always accesses randomly toan accessible non-empty network with equal probability. For the OTSOmethod, at each time slot, every user checks whether the drone-cellnetworks can be access, and if so, the user accesses randomly to anaccessible drone-cell network with equal probability; otherwise, theuser accesses to the cellular network.

The parameter setting in the simulation is summarized as the following:the size of the considered geographic area is 500×500 m², i.e., L=500 mand V=500 m. The repeated stochastic game lasts for 1000 seconds and letthe duration of a time slot be one second; thus, the simulation runs for1000 episodes, i.e., T=1000. In the location model of the drone-cellnetworks, the initial locations of the drones are distributed in theconsidered area independently and uniformly, and their initial headingangles are independent and subject to a uniform distribution on [0,2π),and parameters (V_(d), λ_(d), σ_(d) ²)=(10, 0.1, 0.02). In the networkcapacity model, the capacity (in Mb/s) of the cellular network issubject to a truncated Gaussian distribution N_(tru)(200, 20², ±40), andthe capacity (in Mb/s) of each drone-cell network is independent andsubject to a truncated Gaussian distribution N_(tru)(30, 3², ±6).Furthermore, the coverage radius of a drone-cell R_(d)=100 m.

In the location model of the users, the initial locations of the usersare distributed in the considered area independently and uniformly, andthe initial velocities of the users are independent and subject to 2-DGaussian distribution N(0, 0; 2², 2², 0), where parametersα_(u)=(0.73,0.73) and σ_(u)=(2,2). In the transmission rate model of theusers, a parameter ρ_(R)=0.2 and the process μ_(R)(t) (in Mb/s) takes avalue from the set {μ₁, μ₂, . . . , μ₅ }={2.5, 5, 7.5, 10, 12.5}.Besides, the one-step transition probability matrix P of μ_(R)(t) isshown in Table 1:

TABLE 1 One-step transition probability matrix P of μ_(R) (t) 0.8 0.2 00 0 0.2 0.6 0.2 0 0 0 0.2 0.6 0.2 0 0 0 0.2 0.6 0.2 0 0 0 0.8 2

For the function ƒ(x) in Definition 1, this embodiment sets the networkbusyness ratio threshold x_(b)=0.9. For the MTAU constraints, thisembodiment sets S_(u)={1, . . . , 10} and sets u_(i) ^(c)=6 for alli∈S_(u). For the network selection method provided by the presentdisclosure, set u_(i) ^(max)=20 for all i∈

, K_(R)=5, and the penalty coefficient V=100.

Meanwhile, this embodiment leverages the following four indexes forperformance evaluation of the following proposed method, including:

Queue Stability: this embodiment uses the stability variables defined as

${{s_{Q}(t)} = \frac{\max\limits_{i \in \mathcal{I}}\lbrack {Q_{i}(t)} \rbrack^{+}}{t}},{{s_{D}(t)} = \frac{\max\limits_{{i \in \mathcal{I}},{\upsilon_{i} \in \Omega_{i}^{s}},{\beta_{i} \in A_{i}^{s}}}\lbrack {D_{i,\upsilon_{i}}^{(\beta_{i})}(t)} \rbrack^{+}}{t}},{{s_{Z}(t)} = \frac{\max\limits_{i \in \mathcal{I}}{{Z_{i}(t)}}}{t}},{and}$${s_{H}(t)} = \frac{\max\limits_{l \in s_{u}}\lbrack {H_{i}(t)} \rbrack^{+}}{t}$

over time slots t=1, 2, . . . , T−1 to measure the stability of thequeues of the EFNS method.

Running Time: this is the total time of executing the EFNS method forT=1000 episodes.

Total Utility: this is the total utility of all users during the entiresimulation process,

${i.e.},{\sum\limits_{t = 0}^{T - 1}{\sum\limits_{{i \in \mathcal{I}}\;}^{\;}{{u_{i}(t)}.}}}$

Fairness: this embodiment uses the Jain's fairness index, defined as

${( {\sum\limits_{l \in \mathcal{I}}^{\;}{\overset{\_}{u}}_{i}} )^{2}\text{/}( {N{\sum\limits_{i \in \mathcal{I}}^{\;}{\overset{\_}{u}}_{l}^{2}}} )},$

to measure the fairness of the network resource allocation, where ū_(i)represents the time average utility of a user i during the entiresimulation process, i.e.,

${\overset{\_}{u}}_{i} = {\frac{1}{T}{\sum_{t = 0}^{T - 1}{{u_{i}(t)}.}}}$

In the simulation, this embodiment tests all comparison methods on onehundred randomly generated data sets. For each comparison method, thisembodiment may obtain one hundred results, and the final result is theiraverage.

FIG. 6 illustrates the change of stability variables with time when thenetwork selection method according to the present disclosure is used andwhen the number of users is N=50 and the number of drones is M_(d)=6. Asshown in FIG. 6:

all stability variables decrease rapidly with the increase of time slott. After a long period of time, all stability variables tend to be zero.This result indicates that the EFNS method can guarantee that all queuesare mean-rate stable, and thus the constraints (21.1), (21.1), (21.1)and (21.1) are satisfied.

FIG. 7 illustrates the effect of the number of users IN on the runningtime of the network selection method proposed by the present disclosurewhen the number of drones is M_(d)=6. FIG. 8 illustrates the effect ofthe number of drones M_(d) on the running time of the network selectionmethod proposed by the present disclosure when the number of users isN=50. As shown in FIG. 7 and FIG. 8:

the average running time of the EFNS method increases with the increaseof N or M_(d). This is because when N or M_(d) increases, the scale ofthe problem becomes larger. Meanwhile, the EFNS method can achieve anonline network selection.

FIG. 9 illustrates the effect of the number of users N on the total userutilities obtained by the network selection method proposed by thepresent disclosure and the comparison methods when the number of dronesis M_(d)=6. FIG. 10 illustrates the effect of the number of drones M_(d)on the total user utilities obtained by the network selection methodproposed by the present disclosure and the comparison methods when thenumber of users is N=50. As shown in FIG. 9 and FIG. 10:

the EFNS method can always achieve the highest total utility comparedwith the other three methods by avoiding congestion and making full useof network resources.

For the EFNS method, the total utility thereof increases with theincrease of N. However, the rate of increase slows down because networkcapacity limit the increase of the total utility when N is large. Forthe other three methods, these total utilities thereof begin quickly todecrease with the increase of N. This is because these three methods donot have a mechanism to avoid network congestion and a large number ofusers may lead to network congestion.

For all methods except for the CO method, their total utilities increasemonotonically with the increase of M_(d), since users can offloadtraffic to drone-cell networks.

FIG. 11 illustrates the effect of the number of users N on the Jain'sfairness indices obtained by the network selection method proposed bythe present disclosure and the comparison methods when the number ofdrones is M_(d)=6. FIG. 12 illustrates the effect of the number ofdrones M_(d) on the Jain's fairness index obtained by the networkselection method proposed by the present disclosure and the comparisonmethods when the number of users is N=50. As shown in FIG. 11 and FIG.12:

the EFNS method can achieve a high level of fairness. Specifically, itsfairness index is close to 1 because the fair allocation of networkresources is considered in the proportional fairness function (12).However, the fairness index of the EFNS method decreases gradually withthe increase of N or the decrease of M_(d), that is because a great N ora small M_(d) may activate the MTAU constraints.

Although the RA and OTSO methods do not consider fairness, they can alsoachieve a high level of fairness. This is because the time averageutilities of users are close to each other after a long time due to thehomogeneity of users in the model of this embodiment.

The CO method can achieve the highest level of fairness since usersalways have the same effective transmission ratio at each time slot.This CO method, however, achieves the lowest total utility.

Finally, it should be noted that each of the above embodiments is merelyintended to describe, rather than limit, the technical solutions of thepresent disclosure. Although the present disclosure is described indetail with reference to the foregoing embodiments, persons of ordinaryskill in the art should understand that it is possible to makemodifications to the technical solutions described in the foregoingembodiments, or make equivalent substitutions of part or all technicalfeatures therein. However, these modifications or substitutions do notmake the essence of corresponding technical solutions depart from thescope of the technical solutions in the embodiment solutions of thepresent disclosure.

What is claimed is:
 1. A network selection method for integratedcellular and drone-cell networks, comprising: acquiring a dynamicnetwork model and a dynamic user model, wherein the dynamic networkmodel comprises at least a location model of the drone-cell networks, acapacity model of the cellular network and a capacity model of thedrone-cell networks, and the dynamic user model comprises at least alocation model and a transmission rate model of users; generatingaccessible network sets of the users according to the location model ofthe drone-cell networks and the location model of the users; generatinga random event vector according to the capacity model of the cellularnetwork, the capacity model of the drone-cell networks, the accessiblenetwork sets of the users and the transmission rate model, wherein theaccessible network sets of the users comprise the drone-cell networksand/or the cellular network; generating an action vector according tothe random event vector, wherein the action vector for indicating thatthe users choose to access to the drone-cell networks and/or thecellular network; obtaining an individual utility of each user accordingto the action vector and the random event vector; constructing a firstselection model, wherein the first selection model comprises a firstobjective function and a first constraint, wherein the first objectivefunction is a proportional fairness function with the time average ofindividual utilities of the users as independent variables, and thefirst constraint comprises at least a first coarse correlatedequilibrium constraint, a first minimum individual time average utilityconstraint and a first action probability constraint, wherein the firstcoarse correlated equilibrium constraint is used for constraining thetime average of the individual utilities and first auxiliary variables,the first minimum individual time average utility constraint is used forconstraining the time average of the individual utilities, and the firstaction probability constraint is used for constraining the actionprobability under the condition of the random event vector; wherein thetime average of the individual utilities is obtained according to theindividual utilities, the random event probability and the actionprobability under the condition of the random event vector, wherein theaction probability under the condition of the random event vector is theprobability that the users execute the action vector under the conditionthat the random event vector occurs, and the random event probability isthe probability that the random event vector occurs; and obtaining avalue of the action probability according the first selection model, anddetermining networks that the users choose to access according to thevalue of the action probability.
 2. The method according to claim 1,wherein after the obtaining an individual utility of each user accordingto the action vector and the random event vector, the method furthercomprises: constructing a second selection model, wherein the secondselection model comprises a second objective function and a secondconstraint, wherein the second objective function is a proportionalfairness function with time average expectation of the individualutilities as independent variables, and the second constraint comprisesat least second coarse correlated equilibrium constraints, a secondminimum individual time average utility constraint and a secondauxiliary variable constraint, wherein the second coarse correlatedequilibrium constraints are used for constraining the time averageexpectation of the individual utilities and time average expectation ofthe second auxiliary variables, the second minimum individual timeaverage utility constraint is used for constraining the time averageexpectation of the individual utilities, and the second auxiliaryvariable constraint is used for constraining the second auxiliaryvariables; and obtaining a value of the action vector according thesecond selection model, and determining networks that the users chooseto access according to the value of the action vector.
 3. The methodaccording to claim 2, wherein after the obtaining an individual utilityof each user according to the action vector and the random event vector,the method further comprises: constructing a third selection modelaccording to the second selection model, wherein the third selectionmodel comprises a third objective function and a third constraint,wherein the third objective function is time average expectation of aproportional fair function with third auxiliary variables as independentvariables, and the third constraint comprises at least the second coarsecorrelated equilibrium constraints, the second minimum individual timeaverage utility constraint, the second auxiliary variable constraint,and third auxiliary variable constraints, wherein the second coarsecorrelated equilibrium constraints are used for constraining the timeaverage expectation of the individual utilities and the time averageexpectation of the second auxiliary variables, the second minimumindividual time average utility constraint is used for constraining thetime average expectation of the individual utilities, the secondauxiliary variable constraint is used for constraining the secondauxiliary variables, and the third auxiliary variable constraints areused for constraining time average expectation of the third auxiliaryvariables and the time average expectation of the individual utilities;obtaining a value of the action vector according the third selectionmodel, and determining networks that the users choose to accessaccording to the value of the action vector.
 4. The method according toclaim 3, wherein after the constructing a third selection model, themethod further comprises: constructing a fourth selection model;obtaining a value of the action vector according the fourth selectionmodel, and determining networks that the users choose to accessaccording to the value of the action vector; wherein the fourthselection model is that a difference value between a drift of totalviolation and an utility is less than or equal to a penalty upper bound;the drift of the total violation is obtained according to a value of thetotal violation at a current time slot and a value of the totalviolation at a next time slot; the value of the total violation at thecurrent time slot is obtained according to a first virtual value at thecurrent time slot, a second virtual value at the current time slot and athird virtual value at the current time slot; the first virtual value ina first virtual queue at the current time slot is generated according toviolation of the second coarse correlated equilibrium constraints at theprevious time slot and the first virtual value in the first virtualqueue at the previous time slot; the second virtual value in a secondvirtual queue at the current time slot is generated according toviolation of third auxiliary variable constraints at the previous timeslot and the second virtual value in the second virtual queue at theprevious time slot; the third virtual value in a third virtual queue atthe current time slot is generated according to the violation of asecond minimum individual time average utility constraint at theprevious time slot and the third virtual value in the third virtualqueue at the previous time slot, wherein the first virtual value atinitial time slot, the second virtual value at the initial time slot,and the third virtual value at the initial time slot are all zero; theutility comprises: the proportional fair function and a penaltycoefficient, with the third auxiliary variables as independentvariables; and the penalty upper bound comprises: a constant term, afirst penalty upper bound term, a second penalty upper bound term and athird penalty upper bound term, wherein the first penalty upper boundterm comprises the third auxiliary variables and the second virtualvalue, the second penalty upper bound term comprises the secondauxiliary variables and the first virtual value, and the third penaltyupper bound comprises the individual utilities, the first virtual value,the second virtual value and the third virtual value.
 5. The methodaccording to claim 4, wherein the obtaining a value of the action vectoraccording the fourth selection model, and determining networks that theusers choose to access according to the value of the action vector,specifically comprises: obtaining the first virtual value at the currenttime slot, the second virtual value at the current time slot, the thirdvirtual value at the current time slot and the random event vector atthe current time slot; obtaining values of the third auxiliary variablesaccording to the second virtual value at the current time slot and thefirst penalty upper bound term; obtaining values of the second auxiliaryvariables according to the random event vector at the current time slot,the first virtual value at the current time slot and the second penaltyupper bound term; obtaining the value of the action vector according tothe random event vector at the current time slot, the first virtualvalue at the current time slot, the second virtual value at the currenttime slot, the third virtual value at the current time slot and thethird penalty upper bound term, and determining the networks that theusers choose to access according to the value of the action vector; theobtaining the value of the action vector according to the random eventvector at the current time slot, the first virtual value at the currenttime slot, the second virtual value at the current time slot, the thirdvirtual value at the current time slot and the third penalty upper boundterm, specifically comprises: constructing a suggestion matrixconstraint and a first action vector constraint, wherein a suggestionmatrix represents that the users are suggested to access to thedrone-cell networks and/or the cellular network, and the suggestionmatrix constraint is used for constraining the suggestion matrix; usinga mapping table between the individual utilities and transmission rateand a mapping relationship between the suggestion matrix and the actionvector to process the third penalty upper bound term, to generate afourth penalty upper bound term; using the mapping relationship betweenthe suggestion matrix and the action vector to process the first actionvector constraint, to obtain a second action vector constraint; andobtaining a value of the suggestion matrix according the fourth penaltyupper bound ten, the suggestion matrix constraint and the second actionvector constraint, and determining the networks that the users choose toaccess according to the value of the suggestion matrix.
 6. The methodaccording to claim 1, wherein the first objective function specificallycomprises: ƒ₁=ϕ(ū₁, . . . , u_(i), . . . , u_(N)), wherein, ƒ₁represents the first objective function, ū_(i) represents the timeaverage of the individual utility of the i-th user, and ϕ(ū₁, . . . ,ū_(i), . . . , ū_(N)) represents a proportional fair function; the firstcoarse correlated equilibrium constraint specifically comprises:${{\sum\limits_{\omega \in \Omega}{\sum\limits_{\alpha \in A}^{\;}{{\pi \lbrack\omega\rbrack}{\Pr \lbrack \alpha \middle| \omega \rbrack}{{\hat{u}}_{i}( {\alpha,\omega} )}}}} \geq {\sum\limits_{\omega \in \Omega}{\sum\limits_{\alpha \in A}{{\pi \lbrack\omega\rbrack}{\Pr \lbrack \alpha \middle| \omega \rbrack}{\phi_{i}( \omega_{i} )}}}}},{{\forall{i \in {{\mathcal{I}{\sum\limits_{{{\omega \in \Omega}|\omega_{i}} = \upsilon_{i}}{\sum\limits_{\alpha \in A}{{\pi \lbrack\omega\rbrack}{\Pr \lbrack \alpha \middle| \omega \rbrack}{\phi_{i}( \upsilon_{1} )}}}}} \geq {\sum\limits_{{{\omega \in \Omega}|\omega_{i}} = \upsilon_{i}}^{\;}{\sum\limits_{\alpha \in A}{{\pi \lbrack\omega\rbrack}{\Pr \lbrack \alpha \middle| \omega \rbrack}{{\hat{u}}_{l}( {( {\beta_{i},\alpha_{\overset{\_}{i}}} ),\omega} )}}}}}}};{\forall{i \in \mathcal{I}}}},{\upsilon_{i} \in \Omega_{i}^{s}},{\beta_{i} \in A_{i}^{s}}$wherein, ω represents the random event vector, α represents the actionvector, π[ω] represents a probability of the random event vector,Pr[α|ω] represents an action probability under the condition of therandom event vector, û_(i)(α,ω) represents the individual utility of thei-th user, both φ_(i)(ω_(i)) and φ_(i)(υ_(i)) represent the firstauxiliary variable, α_(ī)=α\{α_(i)}, α_(i) represents that the i-th userchooses to access to the network j=α_(i), ω_(i), represents the i-thelement in the random event vector, υ_(i) represents a preset event ofthe i-th user, β_(i) represents a preset action of the i-th user,

represents an available set of the action vector, Ω represents anavailable set of the random event vector,

represents a simplified available set of the preset action of the i-thuser, and Ω_(i) ^(s) represents a simplified available set of the presetevent of the i-th user; the first minimum individual time averageutility constraint comprises: ū_(i)≥u_(i) ^(c), ∀i∈S_(u), wherein, u_(i)^(c) represents a first minimum individual time average utility, andS_(u) represents a set of the users with minimum individual time averageutility requirements; and the first action probability constraintspecifically comprises: Pr [α|ω] ≥ 0, ∀α ∈ A, ω ∈ ΩPr [α|ω] = 0, ∀α ∉ A, (ω), ω ∈ Ω;${{\sum\limits_{\alpha \in A}^{\;}{\Pr \lbrack \alpha \middle| \omega \rbrack}} = 1},{\forall{\omega \in \Omega}}$wherein,

(ω) represents the available set of the action vector under the randomevent vector ω.
 7. The method according to claim 3, wherein the secondobjective function specifically comprises:${f_{2} = {\underset{tarrow\infty}{\lim \mspace{11mu} \inf}{\varphi ( {{{\overset{\_}{u}}_{1}(t)},\ldots \mspace{14mu},{{\overset{\_}{u}}_{i}(t)},\ldots \mspace{14mu},{{\overset{\_}{u}}_{N}(t)}} )}}},$wherein, ū_(i)(t) represents the time average expectation of theindividual utility of the i-th user; the second coarse correlatedequilibrium constraints specifically comprise: $\begin{matrix}{{{\underset{tarrow\infty}{\lim \mspace{11mu} \inf}\lbrack {{{\overset{\_}{u}}_{i}(t)} - {\sum\limits_{\upsilon_{i} \in \Omega_{i}^{s}}{{\overset{\_}{\phi}}_{i,\upsilon_{i}}(t)}}} \rbrack} \geq 0},{\forall{i \in \mathcal{I}}}} \\{{{\underset{tarrow\infty}{\lim \mspace{11mu} \inf}\lbrack {{{\overset{\_}{\phi}}_{i,\upsilon_{i}}(t)} - {{\overset{\_}{u}}_{i,\upsilon_{i}}^{(\beta_{i})}(t)}} \rbrack} \geq 0},{\forall{i\; \in \mathcal{I}}},{\upsilon_{i} \in \Omega_{i}^{s}},{\beta_{i} \in A_{i}^{s}}}\end{matrix};$ wherein, φ _(i,υ) _(i) (t) represents the time averageexpectation of the second auxiliary variables, and u_(i,υ) _(i) ^((β)^(i) ⁾(t)

û_(i)((β_(i),α_(ī)(t)),ω(t))1{ω_(i)(t)=υ_(i)}, and 1{⋅} represents anindicator function; the second minimum individual time average utilityconstraint specifically comprises:${{\underset{tarrow\infty}{\lim \mspace{11mu} \inf}\lbrack {{{\overset{\_}{\phi}}_{i}(t)} - u_{i}^{c}} \rbrack} \geq 0},{{\forall{i\; \in _{u}}};}$the second auxiliary variable constraint specifically comprises:0≤φ_(i,υ) _(i) (t)≤u _(i) ^(max)1{ω_(i)(t)=υ_(i)}∀t∈{0,1,2, . . . },i∈

,υ _(i)∈Ω_(i) ^(s); the third objective function specifically comprises:${f_{3} = {\underset{tarrow\infty}{\lim \mspace{11mu} \inf}{\overset{\_}{g}(t)}}},$wherein, g(t)=ϕ(γ₁(t), . . . , γ_(N)(t)), 0≤γ_(i)(t)≤u_(i) ^(max), andu_(i) ^(max) represents an upper bound of data transmission raterequired by the i-th user; the second coarse correlated equilibriumconstraints specifically comprise: $\begin{matrix}{{{\underset{tarrow\infty}{\lim \mspace{11mu} \inf}\lbrack {{{\overset{\_}{u}}_{i}(t)} - {\sum\limits_{\upsilon_{i} \in \Omega_{i}^{s}}{{\overset{\_}{\phi}}_{i,\upsilon_{i}}(t)}}} \rbrack} \geq 0},{\forall{i \in \; \mathcal{I}}}} \\{{{\underset{tarrow\infty}{\lim \mspace{11mu} \inf}\lbrack {{{\overset{\_}{\phi}}_{i,\upsilon_{i}}(t)} - {{\overset{\_}{u}}_{i,\upsilon_{i}}^{(\beta_{i})}(t)}} \rbrack} \geq 0},{\forall{i\; \in \mathcal{I}}},{\upsilon_{i} \in \Omega_{i}^{s}},{\beta_{i} \in A_{i}^{s}}}\end{matrix};$ the second minimum individual time average utilityconstraint specifically comprises:${{\underset{tarrow\infty}{\lim \mspace{11mu} \inf}\lbrack {{{\overset{\_}{u}}_{i}(t)} - u_{i}^{c}} \rbrack} \geq 0},{{\forall{i\; \in _{u}}};}$the second auxiliary variable constraint specifically comprises:0≤φ_(i,υ) _(i) (t)≤u _(i) ^(max)1{ω_(i)(t)=υ_(i)}∀t∈{0,1,2, . . . },i∈

,υ _(i)∈Ω_(i) ^(s); the third auxiliary variable constraintsspecifically comprise: $\begin{matrix}{{ \underset{tarrow\infty}{\lim \mspace{11mu}} \middle| {{{\overset{\_}{\gamma}}_{i}(t)} - {{\overset{\_}{u}}_{i}(t)}} | = 0},{\forall{i \in \mathcal{I}}}} \\{{0 \leq {\gamma_{i}(t)} \leq u_{i}^{\max}},{\forall{t \in \{ {0,1,2,\ldots}\mspace{14mu} \}}},{i \in \mathcal{I}}}\end{matrix};$ wherein, γ _(i)(t) represents the time averageexpectation of the third auxiliary variables.
 8. The method according toclaim 4, wherein the fourth selection model specifically comprises:$\begin{matrix}{{{{\Delta (t)} - {{Vg}(t)}} \leq {B + {\sum\limits_{i \in \mathcal{I}}^{\;}{\lbrack {H_{i}(t)} \rbrack^{+}u_{i}^{c}}} - {V\; {\varphi ( {{\gamma_{1}(t)},\ldots \mspace{14mu},{\gamma_{N}(t)}} )}} + {\sum\limits_{i \in \mathcal{I}}^{\;}{{Z_{i}(t)}{\gamma_{i}(t)}}} + {\sum\limits_{i \in \mathcal{I}}^{\;}\{ {\lbrack {Q_{i}(t)} \rbrack^{+}{\sum\limits_{\upsilon_{i} \in \Omega_{i}^{s}}^{\;}{\phi_{i,\upsilon_{i}}(t)}}} \}} - {\sum\limits_{i \in \mathcal{I}}^{\;}{\sum\limits_{{\upsilon_{i} \in \Omega_{i}^{s}},{\beta_{i} \in A_{i}^{s}}}^{\;}{\lbrack {D_{i,\upsilon_{i}}^{(\beta_{i})}(t)} \rbrack^{+}{\phi_{i,\upsilon_{i}}(t)}}}}}}; {{+ {\sum\limits_{i \in \mathcal{I}}^{\;}{\sum\limits_{{\upsilon_{i} \in \Omega_{i}^{s}},{\beta_{i} \in A_{i}^{s}}}^{\;}{\lbrack {D_{i,\upsilon_{i}}^{(\beta_{i})}(t)} \rbrack^{+}{u_{i,\upsilon_{i}}^{(\beta_{i})}(t)}}}}} - {\sum\limits_{i \in \mathcal{I}}^{\;}{\{ {\lbrack {Q_{i}(t)} \rbrack^{+} + {Z_{i}(t)} + \lbrack {H_{i}(t)} \rbrack^{+}} \} {u_{i}(t)}}}}} & \; \\{\mspace{79mu} {{wherein},}} & \; \\{{{L(t)}\overset{\bigtriangleup}{=}{{\frac{1}{2}{\sum\limits_{i \in \mathcal{I}}^{\;}( \lbrack {Q_{i}(t)} \rbrack^{+} )^{2}}} + {\frac{1}{2}{\sum\limits_{i \in \mathcal{I}}^{\;}{\sum\limits_{{\upsilon_{i} \in \Omega_{i}^{s}},{\beta_{i} \in A_{i}^{s}}}^{\;}( \lbrack {D_{i,\upsilon_{i}}^{(\beta_{i})}(t)} \rbrack^{+} )^{2}}}} + {\frac{1}{2}{\sum\limits_{i \in \mathcal{I}}{Z_{i}(t)}^{2}}} + {\frac{1}{2}{\overset{\;}{\sum\limits_{i \in \mathcal{I}}^{\;}}( \lbrack {H_{i}(t)} \rbrack^{+} )^{2}}}}},} & \;\end{matrix}$ Δ(t)=L(t+1)−L(t), [x]⁺=max{x,0}, Q_(i)(t) represents avalue of the first term of the first virtual value at time slot t,${{Q_{i}( {t + 1} )} = {{Q_{i}(t)} + {\sum\limits_{\upsilon_{i} \in \Omega_{i}^{s}}^{\;}{\phi_{i,\upsilon_{i}}(t)}} - {u_{i}(t)}}},{D_{i,\upsilon_{i}}^{(\beta_{i})}(t)}$represents a value of the second term of the first virtual value at timeslot t, D_(i,υ) _(i) ^((β) ^(i) ⁾(t+1)=D_(i,υ) _(i) ^((β) ^(i)⁾(t)+u_(i,υ) _(i) ^((β) ^(i) ⁾(t)−φ_(i,υ) _(i) (t), φ_(i,υ) _(i) (t)represents a value of the second auxiliary variable at time slot t,Z_(i)(t) represents a value of the second virtual value at time slot t,Z_(i)(t+1)=Z_(i)(t)+γ_(i)(t)−u_(i)(t), u_(i)(t) represents an individualutility of the i-th user at time slot t,${B\overset{\bigtriangleup}{=}{{\sum\limits_{i \in \mathcal{I}}^{\;}( u_{i}^{\max} )^{2}} + {\frac{1}{2}{\sum\limits_{i \in \delta_{u}}^{\;}( u_{i}^{\max} )^{2}}} + {\frac{1}{2}{\sum\limits_{i \in \mathcal{I}}^{\;}{{A_{i}^{s}}( u_{i}^{\max} )^{2}}}}}},$|⋅| represents the number of elements in a set, H_(i)(t) represents avalue of the third virtual value at time slot t, andH_(i)(t+1)=H_(i)(t)+u_(i) ^(c)−u_(i)(t).
 9. The method according toclaim 5, wherein the fourth penalty upper bound term specificallycomprises:${\sum\limits_{i \in \mathcal{I}}^{\;}{\sum\limits_{j \in }^{\;}{c_{ij}a_{ij}}}};$the suggestion matrix constraint specifically comprises: $\begin{matrix}{\begin{matrix}\begin{matrix}{\mspace{79mu} {{{\overset{\;}{\sum\limits_{i \in \mathcal{I}}}{{R_{i}(t)}a_{ij}}} \leq {x_{b}{C_{j}(t)}}},{\forall{j \in {\text{\textbackslash}\{ 0 \}}}}}} \\{\mspace{76mu} {{{\sum\limits_{j \in }^{\;}a_{ij}} = 1},{\forall{i \in \mathcal{I}}}}}\end{matrix} \\{\mspace{76mu} {{a_{ij} \in \{ {0,1} \}},{\forall{i \in \mathcal{I}}},{j \in }}}\end{matrix};} & \; \\{\mspace{76mu} {{a_{ij} = 0},{\forall{i \in \mathcal{I}}},{j \notin {A_{i}(t)}}}} & \; \\{{wherein},} & \; \\{c_{ij} = \{ {\begin{matrix}{{( {{E_{i}(t)} - {F_{ij}(t)}} ){R_{i}(t)}},} & {\forall{j \in {\text{\textbackslash}\{ 0 \}}}} \\{{{- {F_{ij}(t)}}{R_{i}(t)}},} & {j = 0}\end{matrix},} } & \; \\{{{E_{i}(t)} = {\lbrack {Q_{i}(t)} \rbrack^{+} + {Z_{i}(t)} + \lbrack {H_{i}(t)} \rbrack^{+}}},} & \; \\{{{F_{ij}(t)} = {\sum\limits_{{\upsilon_{i} \in \Omega_{i}^{s}},{\beta_{i} \in A_{i}^{s}}}^{\;}{\lbrack {D_{i,\upsilon_{i}}^{(\beta_{i})}(t)} \rbrack^{+}\theta_{ij}^{(\beta_{i})}1\{ {{\omega_{i}(t)} = \upsilon_{i}} \}}}},} & \; \\{\theta_{i,j}^{(\beta_{i})} = \{ {\begin{matrix}{1,} & {j = \beta_{i}} \\\theta_{i,{\beta_{i}{(t)}},} & {j \neq \beta_{i}}\end{matrix},} } & \;\end{matrix}$ a_(ij) represents an element in the suggestion matrix,

represents a set of networks, j represents a serial number of a network,θ_(i,β) _(i) (t) is an effective transmission ratio of the i-th useraccessing to the network j=β_(i) at time slot t, R_(i)(t) representstransmission rate of the i-th user at time slot t, C_(j)(t) represents acapacity of the j-th network at time slot t, and

(t) represents an available set of the action of the i-th user at timeslot t.
 10. A network selection apparatus for integrated cellular anddrone-cell networks, comprising: a transceiver, configured to collectcapacity information of the drone-cell networks, capacity information ofthe cellular network, set information of accessible networks of theusers, data transmission rate information, and transmitting actionvector information to the users so that the users can determine thenetworks to access according to the action vector information; aprocessor, configured to generate the action vector informationaccording to the capacity information of the drone-cell networks, thecapacity information of the cellular network, the set information of theaccessible network of the users, the data transmission rate informationand a fourth selection model; wherein the fourth selection model is thata difference value between drift of total violation and an utility isless than or equal to a penalty upper bound; the drift of the totalviolation is obtained according to a value of the total violation at acurrent time slot and a value of the total violation at a next timeslot; the value of the total violation at the current time slot isobtained according to a first virtual value at the current time slot, asecond virtual value at the current time slot and a third virtual valueat the current time slot; the first virtual value in the first virtualqueue at the current time slot is generated according to violation ofsecond coarse correlated equilibrium constraints at a previous time slotand the first virtual value in the first virtual queue at the previoustime slot; the second virtual value in the second virtual queue at thecurrent time slot is generated according to violation of third auxiliaryvariable constraints at the previous time slot and the second virtualvalue in the second virtual queue at the previous time slot; the thirdvirtual value in the third virtual queue at the current time slot isgenerated according to violation of a second minimum individual timeaverage utility constraint at the previous time slot and the thirdvirtual value in the third virtual queue at the previous time slot,wherein the first virtual value at an initial time slot, the secondvirtual value at the initial time slot, and the third virtual value atthe initial time slot are all zero; a third selection model comprises athird objective function and a third constraint, wherein the thirdobjective function is time average expectation of a proportional fairfunction with third auxiliary variables as independent variables, andthe third constraint comprises at least the second coarse correlatedequilibrium constraints, the second minimum individual time averageutility constraint, a second auxiliary variable constraint, and thethird auxiliary variable constraints, wherein the second coarsecorrelated equilibrium constraints are used for constraining timeaverage expectation of the individual utilities and time averageexpectation of the second auxiliary variables, the second minimumindividual time average utility constraint is used for constraining thetime average expectation of the individual utilities, the secondauxiliary variable constraint is used for constraining the secondauxiliary variables, and the third auxiliary variable constraints areused for constraining the time average expectation of the thirdauxiliary variables and the average time expectation of the individualutilities; the first selection model comprises a first objectivefunction and a first constraint, wherein the first objective function isa proportional fairness function with time average of the individualutilities as independent variables, and the first constraint comprisesat least a first coarse correlated equilibrium constraint, a firstminimum individual time average utility constraint and a first actionprobability constraint, wherein the first coarse correlated equilibriumconstraint is used for constraining the time average of the individualutilities and first auxiliary variables, the first minimum individualtime average utility constraint is used for constraining the timeaverage of the individual utilities, and the first action probabilityconstraint is used for constraining the action probability under thecondition of an random event vector; the time average of the individualutilities is obtained according to the individual utilities, a randomevent probability and the action probability under the condition of therandom event vector, wherein the action probability under the conditionof the random event vector is the probability that the users execute theaction vector under the condition that the random event vector occurs;an individual utility of each user is obtained according to the actionvector and the random event vector; and the action vector is generatedaccording to the random event vector, and the random event vector isgenerated according to a capacity model of the cellular network, acapacity model of the drone-cell networks, accessible network sets ofthe users and a transmission rate model, wherein the accessible networksets of the users are generated according to a location model of thedrone-cell networks and a location model of the users.